Verified inference benchmarks.
Every openly-downloadable pack ships with an SHA-256 manifest you can re-verify on your own hardware with uc verify. Gated rows (the 405B flagship, Mixtral, Qwen3-14B, Yi-1.5-9B) are access-on-request; their per-architecture eval provenance is in an internal ledger we share with evaluators. No "trust me bro" on the open packs — run uc verify and confirm the contract holds.
The verified matrix
22 PPL-verified architectures (17 dense + 4 MoE + 1 SSM) plus 1 ViT cosine-verified — across 4 architecture classes — at 5 bits per weight. Every PPL ratio measured under streaming-per-layer reconstruction comparator at seq_len=1024, seed=42, FineWeb-edu held-out tail (n=30–50 unless noted).
| Model | Params | PPL ratio | Drift | Baseline → Compressed | HF artifact | Status |
|---|---|---|---|---|---|---|
| Phi-3.5-MoE-instruct ⚡ NEW (MoE) | 42B MoE | 1.00129× | +0.129% | tightest MoE record | phi-3.5-moe-instruct-uc-v3-bpw5 | gated |
| Phi-3-mini-4k-instruct | 3.8B | 1.00262× | +0.262% | seq_len=128 caveat | phi-3-mini-4k-instruct-uc-v3-bpw5 | live |
| Mixtral-8x7B-v0.1 (MoE) | 47B (13B active) | 1.00368× | +0.368% | — | mixtral-8x7b-v0.1-uc-v3-bpw5 | gated |
| Qwen3-235B-A22B ⚡ NEW (MoE) | 235B (22B active) | 1.00377× | +0.377% | tightest large-MoE record | qwen3-235b-a22b-uc-v3-bpw5 | gated |
| Qwen3-1.7B-Base | 1.7B | 1.00401× | +0.401% | tightest small-decoder | qwen3-1.7b-base-uc-v3-bpw5 | live |
| Qwen3-14B | 14.0B | 1.00403× | +0.403% | scale-invariant codec | qwen3-14b-uc-v3-bpw5 | gated |
| Yi-1.5-9B | 9.0B | 1.00414× | +0.414% | tightest 8-9B dense | yi-1.5-9b-uc-v3-bpw5 | gated |
| Qwen3-8B | 8.0B | 1.00440× | +0.440% | 8B class record | qwen3-8b-uc-v3-bpw5 | live |
| Phi-4 ⚡ NEW | 14.7B | 1.00506× | +0.506% | sub-0.51% drift (Microsoft flagship 14B dense, phi3 arch) | phi-4-uc-v3-bpw5 | gated |
| Mistral-7B-v0.3 ⚡ NEW | 7.0B | 1.00548× | +0.548% | 5th iteration cracked it (4 prior refuted; full ledger in HONEST_NEGATIVE_RESULTS) | mistral-7b-v0.3-uc-v3-bpw5 | live |
| Mixtral-8x22B-v0.1 ⚡ NEW (MoE) | 141B (39B active) | 1.00611× | +0.611% | — | mixtral-8x22b-v0.1-uc-v3-bpw5 | gated |
| Hermes-3-Llama-3.1-405B 🔥 HEADLINE | 405B | 1.0066× | +0.66% | 5.0358 → 5.0692, single 32 GB GPU | hermes-3-llama-3.1-405b-uc-v3-bpw5 | gated |
| Qwen3-0.6B | 0.6B | 1.0069× | +0.69% | — | qwen3-0.6b-uc-v3-bpw5 | live |
| OLMo-2-0425-1B | 1.0B | 1.0073× | +0.73% | — | olmo-2-0425-1b-uc-v3-bpw5 | live |
| OLMo-2-0425-1B-Instruct | 1.0B | 0.9998× | −0.02% | regularization observed | olmo-2-0425-1b-instruct-uc-v3-bpw5 | live |
| SmolLM2-1.7B-Instruct | 1.7B | 1.0075× | +0.75% | — | smollm2-1.7b-instruct-uc-v3-bpw5 | live |
| Qwen3-1.7B (instruct) ⚡ NEW | 1.7B | 1.00782× | +0.782% | instruct variant of Qwen3-1.7B-Base | qwen3-1.7b-uc-v3-bpw5 | live |
| SmolLM2-1.7B | 1.7B | 1.0085× | +0.85% | — | smollm2-1.7b-uc-v3-bpw5 | live |
| Llama-3.1-8B | 8.0B | 1.0125× | +1.25% | baseline; in-band | llama-3.1-8b-uc-v3-bpw5 | live |
Mean across 22 verified records: 1.00537×. Median: 1.00527×.
Reproduce any row in 3 commands
Pick a row above. Copy the HF artifact name. Run:
pip install ultracompress
hf download SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5 --local-dir ./pack
uc verify ./pack
For gated artifacts (10B+), click "Request access" on the HF page. Manual approval, usually within 24h. Free for sub-$1M ARR companies, individuals, research.
uc verify reads the SHA-256 manifest and confirms the downloaded bytes match the validated artifact recorded at compress time. If a single byte drifts, it fails loudly. This is the difference between "reproducible" as marketing and "reproducible" as a contract you can audit.
Eval methodology
- Comparator: per-layer streaming reconstruction. Both bf16 baseline and 5-bit compressed model use the same procedure on the same hardware (single 32 GB GPU), so the ratio isolates the codec contribution rather than confounding it with serving-stack differences.
- Dataset: FineWeb-edu held-out tail (no overlap with calibration). Seq length 1024 unless noted.
- n: 30–50 documents per row. Seed 42, deterministic.
- Hardware: dual RTX 5090 (32 GB each). 405B-class fits inside 32 GB peak via streaming compression — single consumer GPU is part of the value proposition.
- JSON receipts: full per-architecture PPL provenance is available to evaluators on request. We don't ship round numbers without source data behind them.
Reproducible, not cherry-picked
Reproducible weight reconstruction is independently SHA-256-verifiable on every openly-downloadable pack — uc verify confirms it on your own machine, and the PPL reproduction harness reproduces the perplexity comparison against the same held-out FineWeb-edu tail. (The 405B flagship pack is gated — access on request.)
- We publish only results that reproduce. If a number can’t be regenerated on demand from its pack, it doesn’t go on this page.
- No hand-tuned hero runs. Seed 42, deterministic, fixed n per row — the same harness for every architecture.
- State-space models past scalar-only: Mamba-2.8B canonical PPL ratio is 1.00593× with an SSM-compatible comparator (the canonical transformer streaming reconstruction is architecture-incompatible with SSMs). Comparator-note caveat is published with the record.
- Deferred is labelled deferred. Where a pack verifies clean but an eval is blocked, we say so explicitly instead of substituting an estimate.
The public streaming-compression eval JSONs ship in the repo; full per-architecture eval provenance is available to evaluators on request, so the table can be re-derived from source rather than taken on faith.
Want to use these via API?
Same model menu, OpenAI-compatible.
from openai import OpenAI
client = OpenAI(base_url="https://api.sipsalabs.com/v1", api_key="sk-...")
resp = client.chat.completions.create(
model="sipsa-qwen3-8b",
messages=[{"role": "user", "content": "test"}],
)
print(resp.choices[0].message.content)
$5 free credits on signup, no card. See /pricing for the full per-model token rates.
https://api.sipsalabs.com/v1; 22 PPL-verified architectures (17 dense + 4 MoE + 1 SSM) plus 1 ViT cosine-verified are in the live catalog (the 405B flagship is gated — access on request). Grab a free $5 key at /get-access — no card. Want to evaluate UltraCompress on your model first? Start a $5K Phase 0 POC. Questions: founder@sipsalabs.com.
Questions about a specific row?
Direct line to the founder. Single solo founder; you'll hear back within 4–8h US business hours.
- Reproduce verification: founder@sipsalabs.com
- Architecture not in the list: Compression-as-a-Service, see /pricing
- Press / investor: press@sipsalabs.com