Research.
Near-lossless compression at the limits — methods, results, and a reproduction harness for every open-pack number. Reproducible weight reconstruction is SHA-256-verifiable on every openly-downloadable pack, and PPL ratios are reproducible from the published harness (available to evaluators on request); full per-architecture provenance is available on request. We publish only results that reproduce; we don't cherry-pick.
23 verified records: 22 PPL-verified end-to-end + 1 ViT cosine-verified.
22 PPL-verified architectures with held-out perplexity ratios reproduced end-to-end against the bf16 baseline (17 dense + 4 MoE + 1 SSM), plus 1 ViT cosine-verified (DINOv2-Large, CLS cosine). A summary of headline rows is below. See the full benchmarks page →
| Model | Params | PPL ratio | Verified |
|---|---|---|---|
| Hermes-3-Llama-3.1-405B | 405B | 1.0066× | Single 32 GB consumer GPU |
| Qwen3-1.7B-Base | 1.7B | 1.0040× | Tightest dense-decoder record |
| Mixtral-8x7B | 47B MoE | 1.00368× | End-to-end PPL |
| Mistral-7B-v0.3 | 7B | 1.00548× | Tightest dense 7B-class 5-bit ratio we currently publish |
| Qwen3-0.6B | 0.6B | 1.0069× | Local PPL eval |
| OLMo-2-0425-1B-Base | 1B | 1.0073× | End-to-end PPL |
| SmolLM2-1.7B-Instruct | 1.7B | 1.0075× | End-to-end PPL |
| Llama-3.1-8B | 8B | 1.0125× | Architecture-specific floor |
Verify every number yourself.
The positives only mean what they mean if you can regenerate them. Reproducible reconstruction is SHA-256-verifiable on every openly-downloadable pack and PPL ratios are reproducible from the published harness (available to evaluators on request) — full per-architecture provenance available on request; we publish only results that reproduce on demand, and we don't cherry-pick the runs that looked good.
Reproduce it, don't trust it.
uc verify confirms pack structure + download integrity against the public SHA-256 manifest — no GPU required. the PPL reproduction harness reproduces the baseline-vs-compressed perplexity comparison against the same held-out FineWeb-edu tail that produced every record on this page, on a single consumer GPU. Seed 42, deterministic, fixed n per row: the same harness for every architecture, no hand-tuned hero runs.
If your reproduction disagrees with the published matrix by more than the eval tolerance, that's a bug report we want — email founder@sipsalabs.com for the harness. The point is that you never have to take our word for it.
Not all models compress the same way.
Some architectures hit a floor that no in-substrate knob-tuning can break. We publish the floor as the floor — marked as such — rather than chasing a smaller number with a perturbation we can't justify.
The floor is the result.
For Llama-3.1-8B the floor sits at 1.0125×; for Mistral-7B the production substrate ships at 1.00548×. Both numbers are what the production runtime returns today — honest reports of where the current substrate stops, not where we wish it stopped, and both are SHA-256-verifiable on your own machine.
Where a model is floor-bound, we mark it as the floor rather than chasing a smaller number with an adjustment we can't justify. That's the same discipline behind every other figure on this page.
If our results were useful to your work.
Academic and industry citation is welcome. BibTeX block below; reach founder@sipsalabs.com for collaboration, data access, or a preprint coordination conversation.
@software{sipsalabs_ultracompress_2026, author = {{Sipsa Labs}}, title = {UltraCompress: Near-Lossless 5-bit Transformer Compression with SHA-256 Verifiable, Reproducible Reconstruction}, year = {2026}, url = {https://sipsalabs.com/ultracompress}, organization = {Sipsa Labs}, note = {23 verified architectures (22 PPL-verified + 1 ViT cosine-verified); public verifier via {\tt pip install ultracompress}} }
Headline records and reproduction harness are at /inference. Source code, SHA-256 manifests, and the verification CLI are at github.com/sipsalabs/ultracompress ↑. All public artifacts are mirrored at huggingface.co/SipsaLabs ↑.