Verified inference benchmarks.

Every openly-downloadable pack ships with an SHA-256 manifest you can re-verify on your own hardware with uc verify. Gated rows (the 405B flagship, Mixtral, Qwen3-14B, Yi-1.5-9B) are access-on-request; their per-architecture eval provenance is in an internal ledger we share with evaluators. No "trust me bro" on the open packs — run uc verify and confirm the contract holds.

Want to try it instead of just reading numbers? Type a prompt, watch a 5-bit-compressed model stream tokens back. Real inference on a single RTX 5090.
Try it live →
23
architectures · 22 PPL + 1 ViT cosine
4
architecture classes
1.0013×
tightest record
405B
largest verified

The verified matrix

22 PPL-verified architectures (17 dense + 4 MoE + 1 SSM) plus 1 ViT cosine-verified — across 4 architecture classes — at 5 bits per weight. Every PPL ratio measured under streaming-per-layer reconstruction comparator at seq_len=1024, seed=42, FineWeb-edu held-out tail (n=30–50 unless noted).

Model Params PPL ratio Drift Baseline → Compressed HF artifact Status
Phi-3.5-MoE-instruct ⚡ NEW (MoE) 42B MoE 1.00129× +0.129% tightest MoE record phi-3.5-moe-instruct-uc-v3-bpw5 gated
Phi-3-mini-4k-instruct 3.8B 1.00262× +0.262% seq_len=128 caveat phi-3-mini-4k-instruct-uc-v3-bpw5 live
Mixtral-8x7B-v0.1 (MoE) 47B (13B active) 1.00368× +0.368% mixtral-8x7b-v0.1-uc-v3-bpw5 gated
Qwen3-235B-A22B ⚡ NEW (MoE) 235B (22B active) 1.00377× +0.377% tightest large-MoE record qwen3-235b-a22b-uc-v3-bpw5 gated
Qwen3-1.7B-Base 1.7B 1.00401× +0.401% tightest small-decoder qwen3-1.7b-base-uc-v3-bpw5 live
Qwen3-14B 14.0B 1.00403× +0.403% scale-invariant codec qwen3-14b-uc-v3-bpw5 gated
Yi-1.5-9B 9.0B 1.00414× +0.414% tightest 8-9B dense yi-1.5-9b-uc-v3-bpw5 gated
Qwen3-8B 8.0B 1.00440× +0.440% 8B class record qwen3-8b-uc-v3-bpw5 live
Phi-4 ⚡ NEW 14.7B 1.00506× +0.506% sub-0.51% drift (Microsoft flagship 14B dense, phi3 arch) phi-4-uc-v3-bpw5 gated
Mistral-7B-v0.3 ⚡ NEW 7.0B 1.00548× +0.548% 5th iteration cracked it (4 prior refuted; full ledger in HONEST_NEGATIVE_RESULTS) mistral-7b-v0.3-uc-v3-bpw5 live
Mixtral-8x22B-v0.1 ⚡ NEW (MoE) 141B (39B active) 1.00611× +0.611% mixtral-8x22b-v0.1-uc-v3-bpw5 gated
Hermes-3-Llama-3.1-405B 🔥 HEADLINE 405B 1.0066× +0.66% 5.0358 → 5.0692, single 32 GB GPU hermes-3-llama-3.1-405b-uc-v3-bpw5 gated
Qwen3-0.6B 0.6B 1.0069× +0.69% qwen3-0.6b-uc-v3-bpw5 live
OLMo-2-0425-1B 1.0B 1.0073× +0.73% olmo-2-0425-1b-uc-v3-bpw5 live
OLMo-2-0425-1B-Instruct 1.0B 0.9998× −0.02% regularization observed olmo-2-0425-1b-instruct-uc-v3-bpw5 live
SmolLM2-1.7B-Instruct 1.7B 1.0075× +0.75% smollm2-1.7b-instruct-uc-v3-bpw5 live
Qwen3-1.7B (instruct) ⚡ NEW 1.7B 1.00782× +0.782% instruct variant of Qwen3-1.7B-Base qwen3-1.7b-uc-v3-bpw5 live
SmolLM2-1.7B 1.7B 1.0085× +0.85% smollm2-1.7b-uc-v3-bpw5 live
Llama-3.1-8B 8.0B 1.0125× +1.25% baseline; in-band llama-3.1-8b-uc-v3-bpw5 live

Mean across 22 verified records: 1.00537×. Median: 1.00527×.

Reproduce any row in 3 commands

Pick a row above. Copy the HF artifact name. Run:

pip install ultracompress
hf download SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5 --local-dir ./pack
uc verify ./pack

For gated artifacts (10B+), click "Request access" on the HF page. Manual approval, usually within 24h. Free for sub-$1M ARR companies, individuals, research.

The verifier is the contract. uc verify reads the SHA-256 manifest and confirms the downloaded bytes match the validated artifact recorded at compress time. If a single byte drifts, it fails loudly. This is the difference between "reproducible" as marketing and "reproducible" as a contract you can audit.

Eval methodology

Reproducible, not cherry-picked

Reproducible weight reconstruction is independently SHA-256-verifiable on every openly-downloadable pack — uc verify confirms it on your own machine, and the PPL reproduction harness reproduces the perplexity comparison against the same held-out FineWeb-edu tail. (The 405B flagship pack is gated — access on request.)

The public streaming-compression eval JSONs ship in the repo; full per-architecture eval provenance is available to evaluators on request, so the table can be re-derived from source rather than taken on faith.

Want to use these via API?

Same model menu, OpenAI-compatible.

from openai import OpenAI
client = OpenAI(base_url="https://api.sipsalabs.com/v1", api_key="sk-...")
resp = client.chat.completions.create(
    model="sipsa-qwen3-8b",
    messages=[{"role": "user", "content": "test"}],
)
print(resp.choices[0].message.content)

$5 free credits on signup, no card. See /pricing for the full per-model token rates.

API status: live. The OpenAI-compatible endpoint serves at https://api.sipsalabs.com/v1; 22 PPL-verified architectures (17 dense + 4 MoE + 1 SSM) plus 1 ViT cosine-verified are in the live catalog (the 405B flagship is gated — access on request). Grab a free $5 key at /get-access — no card. Want to evaluate UltraCompress on your model first? Start a $5K Phase 0 POC. Questions: founder@sipsalabs.com.

Questions about a specific row?

Direct line to the founder. Single solo founder; you'll hear back within 4–8h US business hours.