How to Cut Your Snowflake Bill by Up to 55% with a single Open Source CLI tool

You already know the loop. The monthly Snowflake bill lands in someone’s inbox. A Slack thread opens with the words warehouse, credits and why in it. A document reaches a title like “Snowflake cost review — final v3.” And six weeks later the bill is the same. Or slightly higher.

Most teams suspect there is a meaningful percentage to recover somewhere in their estate. Very few can put a trustworthy number on it, and almost none can capture it without a person manually chasing queries for a fortnight. The OSO engineers built chukei to close both gaps at once, and in testing against real query histories it has removed as much as 55% of a Snowflake bill — with the recovered figure written into a signed report you can verify yourself.

This article explains where that 55% actually hides, why every tool you have already tried failed to capture it, and how the savings are made safe enough to put directly into your query path. The short version: the savings exist in only a handful of places, the majority of them in one place, and the reason you have never captured them is that almost every Snowflake cost tool reports spend rather than removing it.

The recoverable percentage, and why it is invisible

There are roughly four ways money leaks on Snowflake, and they compound quietly.

Repeated dashboard queries. A BI tool refreshes the same dashboard for forty people every morning at 9am. Snowflake’s native result cache should catch most of this — it is a genuinely good feature, it is free, and it lasts twenty-four hours — but it only fires on byte-identical query text. The moment a BI tool, a transformation tool or an ORM adds a stray whitespace, a generated comment, or renders a parameter binding slightly differently, the cache misses. The query runs again on a warehouse. You pay again.

Warehouses that never suspend. Snowflake’s static auto-suspend timeout is a blunt instrument. Teams crank it up to ten minutes because nobody wants a cold start mid-meeting. Then traffic dies and the warehouse sits idle for nine minutes and fifty-nine seconds, every quiet window, all day long.

SQL shapes that are needlessly expensive. A SELECT * against a wide event table. An exact COUNT(DISTINCT) where an approximate count would have been fine. Sub-queries stamped out by a dashboard tool and re-executed rather than cached. Most teams recognise these patterns; very few have the time to fix them at source.

Fuzzy cost ownership. You know your monthly warehouse spend. You do not know which dashboard, which team, which transformation model or which integration consumed which credits. Without that, every cost conversation ends in “we should tag things better.”

Each of these is individually well understood and individually un-fixed, because fixing them at source is nobody’s full-time job. And they stay invisible for a simple reason: you only ever see a query in a report after it has already executed and billed.

Why you have never been able to capture it

The existing toolbox falls into three buckets, and all three share the same ceiling.

First, the native controls — the static auto-suspend and the byte-identical result cache. Useful, but blunt, and limited to what Snowflake exposes.

Second, the closed-source SaaS cost optimisers. Some of them are genuinely capable products. But they are closed-source, they need your credentials, and you are asked to trust a number that a vendor calculated on your behalf.

Third, the open-source monitoring packages. They produce excellent dashboards. But they are observational by design — they tell you what happened. They do not change what happens.

The common limitation is that every one of these either watches from the side or reacts after the fact. None of them sits at the one point where a query can be changed before it bills you: the query path itself.

That is the whole design premise. The OSO engineers built chukei — Apache 2.0, written in Rust, named after the Japanese 中継, “relay” — as a transparent wire-protocol proxy that sits in the query path. You point your drivers at chukei instead of at Snowflake directly. You change one hostname; nothing else. From the driver’s point of view it is still talking to Snowflake, because the wire protocol matches exactly. But in the middle, chukei is removing work that Snowflake does not need to do twice.

Where the 55% actually comes from

Two interceptions do almost all of the work, and the balance between them tells you most of what you need to know about how chukei behaves.

Idle warehouses: roughly 94% of the recovered spend

This is where the big numbers live. Rather than relying on a single static timeout, chukei observes traffic to each warehouse and builds a Poisson idle model — a statistical model of how long the next query is likely to take to arrive. When the model concludes that the warehouse will sit idle long enough that suspending now and resuming later is cheaper than holding it warm, chukei either suggests a suspend or executes one. In the OSO engineers’ thirty-day simulation, this single capability accounted for roughly 94% of total savings.

Execution is deliberately boring. A service account with a CHUKEI_SUSPENDER role holds OPERATE on the target warehouses, and chukei issues an ALTER WAREHOUSE SUSPEND. You can run it in suggest-only mode for as long as you like — it simply logs the decisions it would have made — and flip to enforce only once you are satisfied the decisions are sane.

Verified result caching: the rest, proven correct

The second lever catches the near-identical queries the native cache misses. The cache key is not the query string. It is a structural fingerprint computed from the canonical parsed abstract syntax tree, plus the bind variables, plus a session scope. Whitespace, comments and parameter rendering all become noise, so two semantically identical queries hash to the same key and collide deterministically.

The keyword is verified. Anyone can build a cache; the hard part is being certain the cached answer is the answer Snowflake would return right now. chukei does this with what the OSO engineers call blame mode: a configurable fraction of cache hits are re-executed against live Snowflake in the background and the results compared. A single mismatch invalidates that fingerprint and increments a counter you can alert on. In a public soak test of 60,000 cache hits, there were zero mismatches. Writes invalidate the cache, and any non-deterministic SQL — anything touching CURRENT_TIMESTAMP, RANDOM, CURRENT_USER and the like — is given a “do not cache” decision at parse time. If chukei cannot prove a query is deterministic, it does not cache it.

The relationship between the two levers is the point. The cache is the part that proves chukei understands your workload. The suspend is the part that pays for itself.

Why the savings are safe to capture

Every data team has the same correct instinct when you propose putting a proxy in the query path: I do not want a thing in my query path. So the design has to answer “what happens when it breaks” before it earns the right to do anything clever. Three invariants make chukei safe to deploy, and the OSO engineers treat them as non-negotiable.

A deterministic hot path. No large language model, no machine-learning model, no plugin sandbox at request time. Every decision a query makes through chukei is the output of code you can read and tests you can run.

Fail open, everywhere. If chukei cannot make a safe decision, it does nothing clever and passes the query through to Snowflake byte-for-byte. Parse error? Pass through. Internal panic? Pass through. Cache miss, or non-deterministic SQL? Pass through. A large chunked result set goes from driver to cloud storage directly without ever touching chukei. Anywhere chukei cannot be certain it is helping, it gets out of the way. The practical consequence is that the worst-case failure mode is a higher bill that day — not a broken query path.

A false-positive-intolerant cache. Blame mode continuously validates the cache against live Snowflake, so correctness is measured rather than assumed.

This is also why the tool is written in Rust rather than something more convenient. A proxy in the query path collapses the moment it adds meaningful latency, so the target is a 2ms p99 overhead, with a 5ms budget that fires an alert if it is crossed. That figure is reachable in Rust without heroics. It is not reachable reliably in a garbage-collected runtime, where a single stop-the-world pause blows the budget.

The OSO engineers back this with a reproducible soak test: thirteen and a half hours of continuous traffic, roughly 120,000 queries, zero cache mismatches, flat memory, kill-mid-traffic drills, and real ALTER WAREHOUSE SUSPEND executions verified against QUERY_HISTORY. The reproduction script ships in the repository so you can run it against your own account.

Putting a signed number on it

A savings figure is only as good as your ability to trust it. This is the part the OSO engineers have not seen elsewhere, and it is what turns a cost-cutting tool into a finance artefact.

Every avoided pound — each cache hit, each suspended idle minute, each equivalence-tested rewrite — is logged into a local ledger together with the methodology used to calculate it. That methodology is deliberately conservative: wall-clock time multiplied by the credit rate multiplied by a 0.7 confidence factor, and the report says exactly that on its front page. You can export the ledger as an evidence bundle signed with Ed25519, and anyone holding the public key can verify the bundle has not been tampered with.

The effect on the cost conversation is the part that matters. It moves from “we should optimise” to “here is a methodology-documented, signed report showing what was recovered.” Your finance partner can audit it. Your internal audit team can audit it. You can hand it to a consultant. The number is not a vendor’s marketing claim — it is a signed artefact with a documented methodology, generated on your own hardware.

Attribution comes for free in the same ledger. Rather than depending on every team to discipline themselves into setting QUERY_TAG, chukei reads what is already on the wire — user, role, application name, transformation model where a comment carries it, BI tool fingerprint where it is recognisable — and writes the attribution into the ledger. So when finance asks which team is responsible for a credit spike, there is finally an answer, and the recovered 55% can be broken down by team rather than guessed at.

Practical takeaways: find your own number

The strongest thing about this approach is that you do not have to take the 55% on faith. There are two low-commitment ways to produce your own figure with chukei.

Start with the replay simulator — about twenty minutes, zero risk. Export the last thirty days of SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY to CSV — one SQL query — and run chukei replay against it. You get back a signed projection: parse coverage of your actual workload, projected cache hit rate, rewrite candidates with savings estimates, and the suspend savings from the Poisson model, all without touching your query path. If the projected percentage is meaningful, you have a business case worth a couple of hours. If it is not, you have spent twenty minutes and you know that. Either outcome is a win.

Then run the live pilot — about two hours, fully reversible. Stand chukei up on a host, point a single client at it by changing one hostname, run a SELECT twice, and watch the second one come back from cache while the mismatch counter stays at zero. Leave it in the path of a few hours of real traffic, then pull a signed realised-savings report. Roll back at any point by pointing the hostname back at Snowflake — chukei holds no client credentials and no state your rollback depends on.

Adopt suspend in stages. Keep the suspend lever in suggest-only first, watch the decisions for a day, then grant the scoped CHUKEI_SUSPENDER service account and flip to enforce. Now the suggestions become real suspends and the savings move from projected to realised in the ledger.

Know when not to bother. chukei is built for analytics workloads. If your latency budget is sub-millisecond, the ~2ms overhead is catastrophic — stay direct. Large chunked extracts pass through untouched and were never going to be cached anyway. And if all you need is a dashboard a human will look at and act on, an observational monitoring package is simpler and probably sufficient. chukei is for when you want the savings to happen automatically, with evidence, and without a human in the loop on every decision.

The point

The recoverable percentage was always sitting in your estate. You could not capture it because almost every tool you tried observed the problem from the side instead of intervening, and you could not trust the figure because it came from a vendor’s dashboard. Intercepting queries in the path changes both halves of that: the savings become reachable because you are acting before a query bills, and the number becomes trustworthy because it is signed, conservative, and generated on your own hardware.

Up to 55% of a Snowflake bill is a large claim. The honest way to make it is to hand you a twenty-minute method for finding out whether your own number is anywhere near it — and a signed report you can take to finance once you do.

Ready to find your own Snowflake savings number?

Open-source tool: chukei.dev
Source code: github.com/osodevops/chukei
Documentation: docs.chukei.dev

This post first appeared on Read More

The recoverable percentage, and why it is invisible

Why you have never been able to capture it

Where the 55% actually comes from

Idle warehouses: roughly 94% of the recovered spend

Verified result caching: the rest, proven correct

Why the savings are safe to capture

Putting a signed number on it

Practical takeaways: find your own number

The point

Related Posts

Meeting Data (and Analytics) Engineers Where They Are: Introducing the dbt Adapter for Confluent Cloud

Build Compliant AI Agents With Stateful Stream Processing

Build vs Buy Streaming for Real-Time RAG: 2026 Guide

Architectural Decision Guide: When to Use Apache Kafka (And When You Shouldn’t)

What is Confluent? How It’s Different from Apache Kafka

What is Confluent? How It’s Different from Apache Kafka