Can you prove the model in production is the one you validated?

Every model-governance regime — financial model risk, medical-device clearance, defense airworthiness — rests on one unstated assumption: the model serving production traffic is the model that passed validation. Standard practice cannot actually prove it. Near-lossless 5-bit compression with SHA-256 reproducible reconstruction can — in one cryptographic check.

Sipsa Inference · 2026-05-22 · Posted by the Sipsa Labs team

SHA-256

Deployed = validated, one check

0.00e+00

Max reconstruction difference (fp32)

5-bit

Near-lossless quality, ~3× smaller

architectures · 22 PPL + 1 ViT cosine

The assumption nobody verifies

Pick any framework that governs how AI models are allowed into production. Financial services have the Federal Reserve's SR 11-7 model-risk guidance. Medical AI has device clearance. Defense and safety-critical autonomy have airworthiness and Authority-to-Operate processes. They differ in the details, but they share one load-bearing assumption: the artifact running in production is the artifact that was validated.

Auditors ask to see the validation report. They rarely ask the harder question — prove the bytes serving live traffic match the bytes you tested — because, in most shops, nobody can. The validated model and the deployed model are connected by a deployment pipeline and a chain of trust, not by a proof.

Compression widens the gap

To fit a large model on hardware you can actually afford, you compress or quantize it. The moment you do, the deployed model is numerically different from whatever you validated. If you validated the full-precision model and shipped a quantized copy, the deployed model is not the validated model — by construction.

That has a concrete operational cost. When a production output looks wrong, you cannot cleanly tell a genuine model fault from a compression artifact. Your monitoring sees one anomaly stream and cannot separate the two causes. Conventional quantization methods — AWQ, GPTQ, and the rest — trade measurable accuracy for footprint, and they give you no built-in way to attest that the copy in production is even the same quantized artifact across machines, reloads, and hardware revisions.

Validate what you deploy — and prove it

UltraCompress takes a different path. You compress a model once into a .uc pack. That pack reconstructs reproducibly: the same weights, byte for byte, on every load and on any hardware. The measured maximum absolute difference across reconstructions is 0.00e+00 in 32-bit float — not "small," zero.

Because reconstruction is exact and deterministic, the artifact becomes hashable in a way that means something:

Compress your model into a .uc pack.
Validate the model that pack reconstructs — the model you will actually deploy.
Take a SHA-256 of the reconstructed weights. That is your validation digest.
At deploy time, and again at runtime, recompute the digest. A match proves the model in production is, byte for byte, the model you validated.

"Is the production model the validated model?" stops being a matter of pipeline trust and becomes a one-line cryptographic check that runs in milliseconds.

One point of honesty, because it matters: this proves the deployed model equals the validated compressed model. The compressed model has a small, measured accuracy difference from the original full-precision weights, and we publish that number for every pack — for example, Hermes-3-Llama-3.1-405B at a 1.0066× perplexity ratio. The discipline is simple: you qualify the exact artifact you will deploy, and from that point on every deployment is provably that artifact. You are not asked to trust that compression changed nothing. You are given a number for what it changed, and a proof that nothing changed after.

Where deployed-equals-validated is not optional

Financial services. SR 11-7 and equivalent model-risk frameworks require the production model to be the validated model. A cryptographic equality check is a clean, auditable control instead of a process attestation.
Healthcare AI. A cleared or validated diagnostic-adjacent model must remain the version that was cleared. Provenance you can recompute on demand beats a change-management log.
Defense and safety-critical autonomy. When a model runs on a size-, weight-, and power-constrained edge platform, the deployed model must be provably the accredited one. An in-flight output anomaly should never be ambiguous between a real fault and a compression artifact.

Three different regulators, one shared requirement: cryptographic proof that the model in production is bit-for-bit the model that was reviewed. That proof is a byproduct of building compression to be reproducible in the first place.

Try it

Every UltraCompress pack ships with a SHA-256 manifest and a uc verify command. The Sipsa Inference API serves these packs behind an OpenAI-compatible endpoint — keep your SDK, change the base URL, $5 of free credit, no card.

pip install ultracompress
uc verify <your_pack>.uc   # verify pack structure + SHA-256 download integrity

Compression is run by Sipsa Labs under engagement; the open-source package verifies the published pack's integrity — it does not compress. Either way, the provenance check is the same one your auditor can run.

Sipsa Labs is an experimental and deep tech-and-software company. UltraCompress is the first publicly-shipped product. Sipsa Inference is the second. More products in flight.