Can you prove the model in production is the one you validated?

Every model-governance regime — financial model risk, medical-device clearance, defense airworthiness — rests on one unstated assumption: the model serving production traffic is the model that passed validation. Standard practice cannot actually prove it. Near-lossless 5-bit compression with SHA-256 reproducible reconstruction can — in one cryptographic check.

Sipsa Inference · 2026-05-22 · Posted by the Sipsa Labs team

SHA-256
Deployed = validated, one check
0.00e+00
Max reconstruction difference (fp32)
5-bit
Near-lossless quality, ~3× smaller
23
architectures ยท 22 PPL + 1 ViT cosine

The assumption nobody verifies

Pick any framework that governs how AI models are allowed into production. Financial services have the Federal Reserve's SR 11-7 model-risk guidance. Medical AI has device clearance. Defense and safety-critical autonomy have airworthiness and Authority-to-Operate processes. They differ in the details, but they share one load-bearing assumption: the artifact running in production is the artifact that was validated.

Auditors ask to see the validation report. They rarely ask the harder question — prove the bytes serving live traffic match the bytes you tested — because, in most shops, nobody can. The validated model and the deployed model are connected by a deployment pipeline and a chain of trust, not by a proof.

Compression widens the gap

To fit a large model on hardware you can actually afford, you compress or quantize it. The moment you do, the deployed model is numerically different from whatever you validated. If you validated the full-precision model and shipped a quantized copy, the deployed model is not the validated model — by construction.

That has a concrete operational cost. When a production output looks wrong, you cannot cleanly tell a genuine model fault from a compression artifact. Your monitoring sees one anomaly stream and cannot separate the two causes. Conventional quantization methods — AWQ, GPTQ, and the rest — trade measurable accuracy for footprint, and they give you no built-in way to attest that the copy in production is even the same quantized artifact across machines, reloads, and hardware revisions.

Validate what you deploy — and prove it

UltraCompress takes a different path. You compress a model once into a .uc pack. That pack reconstructs reproducibly: the same weights, byte for byte, on every load and on any hardware. The measured maximum absolute difference across reconstructions is 0.00e+00 in 32-bit float — not "small," zero.

Because reconstruction is exact and deterministic, the artifact becomes hashable in a way that means something:

"Is the production model the validated model?" stops being a matter of pipeline trust and becomes a one-line cryptographic check that runs in milliseconds.

One point of honesty, because it matters: this proves the deployed model equals the validated compressed model. The compressed model has a small, measured accuracy difference from the original full-precision weights, and we publish that number for every pack — for example, Hermes-3-Llama-3.1-405B at a 1.0066× perplexity ratio. The discipline is simple: you qualify the exact artifact you will deploy, and from that point on every deployment is provably that artifact. You are not asked to trust that compression changed nothing. You are given a number for what it changed, and a proof that nothing changed after.

Where deployed-equals-validated is not optional

Three different regulators, one shared requirement: cryptographic proof that the model in production is bit-for-bit the model that was reviewed. That proof is a byproduct of building compression to be reproducible in the first place.

Try it

Every UltraCompress pack ships with a SHA-256 manifest and a uc verify command. The Sipsa Inference API serves these packs behind an OpenAI-compatible endpoint — keep your SDK, change the base URL, $5 of free credit, no card.

pip install ultracompress
uc verify <your_pack>.uc   # recompute the digest, confirm bit-identity

Or compress and verify your own model locally with the open-source package. Either way, the provenance check is the same one your auditor can run.


Sipsa Labs is an experimental and deep tech-and-software company. UltraCompress is the first publicly-shipped product. Sipsa Inference is the second. More products in flight.