Verify our claims yourself.
We publish PPL ratios. We ship the tool to verify them. We document failures alongside wins.
No "trust me" — every published number is reproducible end-to-end on your hardware. Both contracts (shape + quality) ship today in v0.6.9; a productized third-party audit service is on the Q3 2026 roadmap.
The verification stack.
Three independent layers, each answering a different question. The first two ship in the public package; the third is on the roadmap as a paid service. If a claim doesn't survive all three, we don't ship it as a customer-facing number.
uc verify <pack>
Question it answers
"Did I get the same artifact the trainer measured?" Reads the SHA-256 manifest bundled inside the pack and confirms every layer reconstructs to a bit-identical match of the bytes the trainer wrote. If a single byte drifts, it fails loudly.
uc bench-ppl --against-bf16
Question it answers
"Does the published 1.0066× ratio actually reproduce on my hardware?" Pulls the bf16 baseline from HuggingFace, runs the eval on the FineWeb-edu held-out tail (seq_len=1024, seed=42) on your GPU, computes the PPL ratio locally, and prints whether it matches the published value.
Audit & Verification Service
Question it answers"Can a third party reproduce these numbers under a contract?" Productized service tier where Sipsa runs an end-to-end re-eval against a customer-supplied baseline and signs the resulting receipt. Targets regulated buyers who need an external attestation, not just self-serve reproducibility.
uc verify proves the customer's bytes equal the trainer's bytes. uc bench-ppl proves those bytes behave the same way on a held-out evaluation. Either one alone is incomplete. v0.6.7 ships the first contract; v0.6.9 ships the second. We're saying that out loud here because the gap was real and we'd rather close it in public than gloss over it.
Reproduce a published row in 3 commands
Today (shape contract):
# 1. install
pip install ultracompress
# 2. pull a published artifact
hf download SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5 --local-dir ./pack
# 3. verify SHA-256 reconstruction is bit-identical
uc verify ./pack
# → VERIFY: PASS — bit-identical reconstruction guaranteed.
This week (quality contract, v0.6.9):
# 4. reproduce the published PPL ratio on your hardware
uc bench-ppl ./pack --against-bf16
# → published: 1.00401× measured: 1.0041× match: PASS
Every published number in BENCHMARKS_2026_05_10.json traces to a source eval JSON in scripts/overlay/artifacts/ in the public repo. We don't ship round numbers without the underlying receipts.
The honest negative side.
Most projects publish the wins and quietly archive the failures. We catalogue both at the same level of detail, and the ratio is meant to be read as a feature: a research lab that hides nothing should expect a refutation count larger than its publication count.
18+ documented refutations from a single research arc.
From "AWQ-style channel pre-scaling regressed 13× the uniform baseline" to "the universal cure for the dense PPL floor doesn't exist at our current configuration," every closed hypothesis is published with the catalysing experiment, the measured number, and the reason we stopped pursuing it. Researchers comparing 5-bit codecs should treat the document as the audit trail.
Read the full catalogue →The README's "What doesn't work yet" section is the same idea, applied to capabilities a reader might assume work because the rest of the package does. Both surfaces stay fresh.
What customers should ask us.
If you're evaluating a 5-bit lossless codec for production, these are the right questions to bring to the call. Our answers are the same in writing as they are on a sales call.
-
"Can I reproduce your headline number on my own hardware?"
Yes. Both contracts ship today in
v0.6.9. The shape contract reproduces viauc verify; the quality contract (PPL ratio against the bf16 baseline pulled from HuggingFace) reproduces viauc bench-ppl --against-bf16. Same JSON schema as our published numbers. -
"Is there an independent third-party audit?"
Coming as a productized service in Q3 2026 as part of the Compression-as-a-Service catalog. Until then: every published number is fully self-reproducible by the verification stack above, and the source eval JSONs are in the public repo for any external auditor to re-derive.
-
"What about regulatory-grade verification (SR 11-7, FDA, defense)?"
The on-prem MSA tier includes audit-trail logs, reproducibility manifests, and the bit-identical reconstruction contract that maps onto SR 11-7 model validation, FDA-regulated healthcare AI dev/deploy equivalence, and defense deploy-bit-exactness. See /enterprise for the compliance architecture and the three engagement tiers.
-
"What if the architecture I care about isn't in your verified matrix?"
That's the Phase 0 POC. Bring a model, we deliver a UC pack inside 5 business days, and you self-verify it with
uc verify+ benchmark it withuc bench-pplagainst your existing AWQ/GPTQ build on your hardware. If we don't materially help, you keep the diagnostic and we don't push Phase 1.
For VCs specifically.
We know the verifier-quality gap is the #1 trust objection from technical diligence. Here's the timeline to close it — written in the present and near-future tense, not the aspirational quarter-plus-two.
Three contracts, one credibility surface.
The objection is real and we're closing it on a public schedule. Each step below is either shipped or has a hard near-term ship date attached, not a vague "later."
- v0.6.7Shape contract. —
uc verifyships today. SHA-256 manifest match between customer artifact and trainer's measured artifact. - v0.6.9Quality contract. —
uc bench-ppl --against-bf16LIVE today. Reproduces the published PPL ratio on the customer's hardware against a HuggingFace-pulled bf16 baseline. - Q3 2026Productized audit service. — Third-party-runnable verification under a paid contract, with a signed receipt. Roadmap item on the Compression-as-a-Service catalog.
The pattern is the same on every other surface: ship the verifier alongside the claim, document what didn't make it, refuse to round the numbers. The verification stack is the product around the codec, not a footnote underneath it.
Want to verify a specific row?
Single solo founder, US business hours, 4–8h response. Reproduce a published benchmark, request a Phase 0 POC for an architecture not yet in the matrix, or ask anything diligence-grade about the verification stack.