Verified inference benchmarks.

Every openly-downloadable pack ships with an SHA-256 manifest you can re-verify on your own hardware with uc verify. Gated rows (the 405B flagship, Mixtral, Qwen3-14B, Yi-1.5-9B) are access-on-request; their per-architecture eval provenance is in an internal ledger we share with evaluators. No "trust me bro" on the open packs — run uc verify and confirm the contract holds.

Want to try it instead of just reading numbers? Type a prompt, watch a 5-bit-compressed model stream tokens back. Real inference on a single RTX 5090.

Try it live →

architectures · 22 PPL + 1 ViT cosine

architecture classes

1.0013×

tightest record

405B

largest verified

The verified matrix

22 PPL-verified architectures (17 dense + 4 MoE + 1 SSM) plus 1 ViT cosine-verified — across 4 architecture classes — at 5 bits per weight. Every PPL ratio measured under streaming-per-layer reconstruction comparator at seq_len=1024, seed=42, FineWeb-edu held-out tail (n=30–50 unless noted).

Model	Params	PPL ratio	Drift	Baseline → Compressed	HF artifact	Status
Phi-3.5-MoE-instruct ⚡ NEW (MoE)	42B MoE	1.00129×	+0.129%	tightest MoE record	phi-3.5-moe-instruct-uc-v3-bpw5	gated
Phi-3-mini-4k-instruct	3.8B	1.00262×	+0.262%	seq_len=128 caveat	phi-3-mini-4k-instruct-uc-v3-bpw5	live
Mixtral-8x7B-v0.1 (MoE)	47B (13B active)	1.00368×	+0.368%	—	mixtral-8x7b-v0.1-uc-v3-bpw5	gated
Qwen3-235B-A22B ⚡ NEW (MoE)	235B (22B active)	1.00377×	+0.377%	tightest large-MoE record	qwen3-235b-a22b-uc-v3-bpw5	gated
Qwen3-1.7B-Base	1.7B	1.00401×	+0.401%	tightest small-decoder	qwen3-1.7b-base-uc-v3-bpw5	live
Qwen3-14B	14.0B	1.00403×	+0.403%	scale-invariant codec	qwen3-14b-uc-v3-bpw5	gated
Yi-1.5-9B	9.0B	1.00414×	+0.414%	tightest 8-9B dense	yi-1.5-9b-uc-v3-bpw5	gated
Qwen3-8B	8.0B	1.00440×	+0.440%	8B class record	qwen3-8b-uc-v3-bpw5	live
Phi-4 ⚡ NEW	14.7B	1.00506×	+0.506%	sub-0.51% drift (Microsoft flagship 14B dense, phi3 arch)	phi-4-uc-v3-bpw5	gated
Mistral-7B-v0.3 ⚡ NEW	7.0B	1.00548×	+0.548%	5th iteration cracked it (4 prior refuted; full ledger in HONEST_NEGATIVE_RESULTS)	mistral-7b-v0.3-uc-v3-bpw5	live
Mixtral-8x22B-v0.1 ⚡ NEW (MoE)	141B (39B active)	1.00611×	+0.611%	—	mixtral-8x22b-v0.1-uc-v3-bpw5	gated
Hermes-3-Llama-3.1-405B 🔥 HEADLINE	405B	1.0066×	+0.66%	5.0358 → 5.0692, single 32 GB GPU	hermes-3-llama-3.1-405b-uc-v3-bpw5	gated
Qwen3-0.6B	0.6B	1.0069×	+0.69%	—	qwen3-0.6b-uc-v3-bpw5	live
OLMo-2-0425-1B	1.0B	1.0073×	+0.73%	—	olmo-2-0425-1b-uc-v3-bpw5	live
OLMo-2-0425-1B-Instruct	1.0B	0.9998×	−0.02%	regularization observed	olmo-2-0425-1b-instruct-uc-v3-bpw5	live
SmolLM2-1.7B-Instruct	1.7B	1.0075×	+0.75%	—	smollm2-1.7b-instruct-uc-v3-bpw5	live
Qwen3-1.7B (instruct) ⚡ NEW	1.7B	1.00782×	+0.782%	instruct variant of Qwen3-1.7B-Base	qwen3-1.7b-uc-v3-bpw5	live
SmolLM2-1.7B	1.7B	1.0085×	+0.85%	—	smollm2-1.7b-uc-v3-bpw5	live
Llama-3.1-8B	8.0B	1.0125×	+1.25%	baseline; in-band	llama-3.1-8b-uc-v3-bpw5	live

Mean across 22 verified records: 1.00537×. Median: 1.00527×.

Reproduce any row in 3 commands

Pick a row above. Copy the HF artifact name. Run:

pip install ultracompress
hf download SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5 --local-dir ./pack
uc verify ./pack

For gated artifacts (10B+), click "Request access" on the HF page. Manual approval, usually within 24h. Free for sub-$1M ARR companies, individuals, research.

The verifier is the contract. uc verify reads the SHA-256 manifest and confirms the downloaded bytes match the validated artifact recorded at compress time. If a single byte drifts, it fails loudly. This is the difference between "reproducible" as marketing and "reproducible" as a contract you can audit.

Eval methodology

Comparator: per-layer streaming reconstruction. Both bf16 baseline and 5-bit compressed model use the same procedure on the same hardware (single 32 GB GPU), so the ratio isolates the codec contribution rather than confounding it with serving-stack differences.
Dataset: FineWeb-edu held-out tail (no overlap with calibration). Seq length 1024 unless noted.
n: 30–50 documents per row. Seed 42, deterministic.
Hardware: dual RTX 5090 (32 GB each). 405B-class fits inside 32 GB peak via streaming compression — single consumer GPU is part of the value proposition.
JSON receipts: full per-architecture PPL provenance is available to evaluators on request. We don't ship round numbers without source data behind them.

Reproducible, not cherry-picked

Reproducible weight reconstruction is independently SHA-256-verifiable on every openly-downloadable pack — uc verify confirms it on your own machine, and the PPL reproduction harness reproduces the perplexity comparison against the same held-out FineWeb-edu tail. (The 405B flagship pack is gated — access on request.)

We publish only results that reproduce. If a number can’t be regenerated on demand from its pack, it doesn’t go on this page.
No hand-tuned hero runs. Seed 42, deterministic, fixed n per row — the same harness for every architecture.
State-space models past scalar-only: Mamba-2.8B canonical PPL ratio is 1.00593× with an SSM-compatible comparator (the canonical transformer streaming reconstruction is architecture-incompatible with SSMs). Comparator-note caveat is published with the record.
Deferred is labelled deferred. Where a pack verifies clean but an eval is blocked, we say so explicitly instead of substituting an estimate.

The public streaming-compression eval JSONs ship in the repo; full per-architecture eval provenance is available to evaluators on request, so the table can be re-derived from source rather than taken on faith.

Want to use these via API?

Same model menu, OpenAI-compatible.

from openai import OpenAI
client = OpenAI(base_url="https://api.sipsalabs.com/v1", api_key="sk-...")
resp = client.chat.completions.create(
    model="sipsa-qwen3-8b",
    messages=[{"role": "user", "content": "test"}],
)
print(resp.choices[0].message.content)

$5 free credits on signup, no card. See /pricing for the full per-model token rates.

API status: live. The OpenAI-compatible endpoint serves at https://api.sipsalabs.com/v1; 22 PPL-verified architectures (17 dense + 4 MoE + 1 SSM) plus 1 ViT cosine-verified are in the live catalog (the 405B flagship is gated — access on request). Grab a free $5 key at /get-access — no card. Want to evaluate UltraCompress on your model first? Start a $5K Phase 0 POC. Questions: founder@sipsalabs.com.

Questions about a specific row?

Direct line to the founder. Single solo founder; you'll hear back within 4–8h US business hours.

Reproduce verification: founder@sipsalabs.com
Architecture not in the list: Compression-as-a-Service, see /pricing
Press / investor: press@sipsalabs.com