Sipsa Labs · Research surface · 14 verified records · 18+ refutations published

Research.

Lossless compression at the limits — methods, results, and what we tried that didn't work. Every number on this page traces to a JSON file in the public lab notebook. Every negative result is published with the same level of detail as the positives.

/ Verified PPL records

14 records, PPL-verified end-to-end.

22 architectures shipped, 14 with held-out perplexity ratios reproduced end-to-end against the bf16 baseline. A summary of headline rows is below. See the full benchmarks page →

Model	Params	PPL ratio	Verified
Hermes-3-Llama-3.1-405B	405B	1.0066×	Single 32 GB consumer GPU
Qwen3-1.7B-Base	1.7B	1.0040×	Tightest dense-decoder record
Mixtral-8x7B	47B MoE	1.0037×	End-to-end PPL
Mistral-7B-v0.3	7B	1.00548×	New record — tightest dense 7B-class 5-bit
Qwen3-0.6B	0.6B	1.0069×	Local PPL eval
OLMo-2-0425-1B-Base	1B	1.0073×	End-to-end PPL
SmolLM2-1.7B-Instruct	1.7B	1.0075×	End-to-end PPL
Llama-3.1-8B	8B	1.0125×	Architecture-specific floor

Held-out PPL on FineWeb-edu. Baseline = bf16 reference weights. Full verified matrix →

/ Honest negative results

What we tried that didn't work.

We catalogue refutations for the same reason we publish positives: the positives only mean what they mean if the failures are visible too. Four representative entries below; the full list lives in the public lab notebook.

Refuted · 2/3 architectures

Instruct fine-tuning hurts compression

We pre-registered the hypothesis that base models would always compress tighter than their instruct variants, holding the recipe constant. Tested on three controlled base/instruct pairs. Result: one pair supported the hypothesis, two refuted it — in one case the instruct variant compressed tighter than its bf16 baseline, acting as a faint regularizer.

Takeaway Quantization-friendliness is architecture- and training-recipe-dependent, not universal. Hypothesis dropped.

Refuted · 4 perturbations

Llama-3.1-8B floor breakable by knob-tuning

We attempted four independent perturbations of the production substrate to push Llama-3.1-8B below its 1.0125× floor: capacity raise, schedule extension, single-objective hidden-MSE, and a hybrid objective. All four refuted. The fourth attempt landed at 1.0698× — 5.6× worse than the production result.

Takeaway The Llama floor is architecture-specific and empirically bounded against knob-tuning. Substrate-level changes required to break it.

Refuted · hard-killed at layer 24

Mistral-7B cure via better training control

Two sequential cure runners with pre-registered kill conditions, designed to push Mistral-7B below the production 1.0055× result via probe-driven step rescue and teacher-state refresh. Both hard-killed at layer 24 of 32 with bit-identical trip signatures — teacher-drift hypothesis explicitly refuted by the second runner. The substrate loses output-distribution control at depth, independent of training dynamics.

Takeaway In-layer training control is not the bottleneck on Mistral. Substrate-capacity step-up at depth is the open direction.

Refuted · SSMs need different math

SVD warm-start on a non-transformer

We attempted to extend a correction-overlay technique that works on transformer architectures to a state-space model by applying truncated SVD on the quantization residual. 0.07pp worse than the codec-only baseline. The directional noise injected by a truncated SVD on a high-rank residual was not aligned with the actual activation distribution of the architecture.

Takeaway The value of the overlay comes from training against real activations, not from the SVD initialization itself. A useful initializer, not a corrector.

18+ refutations from a single research arc. Read the public lab notebook ↑ · Research write-ups →

/ Architecture-specific floors

Not all models compress the same way.

Some architectures hit a floor that no in-substrate knob-tuning can break. We publish the floor as the floor — marked as such — rather than chasing a smaller number with a perturbation we can't justify.

The floor is the result.

For Llama-3.1-8B the floor sits at 1.0125×, empirically bounded against four independent perturbation directions. For Mistral-7B the production substrate ships at 1.0055×, with six cure attempts refuted. Both numbers are what the production runtime returns today; both are honest reports of where the current substrate stops, not where we wish it stopped.

Substrate-level changes (mixed-precision allocation, trellis-codec experimentation, depth-aware overlay capacity) sit at the next research layer. Parameter-tuning the existing substrate is closed for these two architectures.

Read the full write-up →

/ Citations welcome

If our results were useful to your work.

Academic and industry citation is welcome. BibTeX block below; reach founder@sipsalabs.com for collaboration, data access, or a preprint coordination conversation.

@software{sipsalabs_ultracompress_2026,
  author       = {{Sipsa Labs}},
  title        = {UltraCompress: Lossless 5-bit Transformer Compression with SHA-256 Bit-Identical Reconstruction},
  year         = {2026},
  url          = {https://sipsalabs.com/ultracompress},
  organization = {Sipsa Labs},
  note         = {22 architectures shipped, 14 PPL-verified end-to-end; public verifier via {\tt pip install ultracompress}}
}

Headline records and reproduction harness are at /inference. Source code, signed manifests, and the public lab notebook are at github.com/sipsalabs/ultracompress ↑. All public artifacts are mirrored at huggingface.co/SipsaLabs ↑.