Verified inference benchmarks.

Every row below has a public HuggingFace artifact, an SHA-256 manifest you can re-verify on your hardware, and a JSON evaluation receipt. No "trust me bro" — run uc verify and confirm the contract holds.

14
verified records
22
architectures
1.0040×
tightest record
405B
largest verified

The verified matrix

14 architectures end-to-end at 5 bits per weight. Every PPL ratio measured under streaming-per-layer reconstruction comparator at seq_len=1024, seed=42, FineWeb-edu held-out tail (n=30–50 unless noted).

Model Params PPL ratio Drift Baseline → Compressed HF artifact Status
Phi-3-mini-4k-instruct 3.8B 1.00262× +0.262% seq_len=128 caveat phi-3-mini-4k-instruct-uc-v3-bpw5 live
Mixtral-8x7B-v0.1 (MoE) 47B (13B active) 1.00368× +0.368% mixtral-8x7b-v0.1-uc-v3-bpw5 gated
Qwen3-1.7B-Base 1.7B 1.00401× +0.401% tightest small-decoder qwen3-1.7b-base-uc-v3-bpw5 live
Qwen3-14B 14.0B 1.00403× +0.403% scale-invariant codec qwen3-14b-uc-v3-bpw5 gated
Yi-1.5-9B 9.0B 1.00414× +0.414% tightest 8-9B dense yi-1.5-9b-uc-v3-bpw5 gated
Qwen3-8B 8.0B 1.00440× +0.440% 8B class record qwen3-8b-uc-v3-bpw5 live
Mistral-7B-v0.3 ⚡ NEW 7.0B 1.00548× +0.548% 5th cure attempt cracked it (4 prior refuted) mistral-7b-v0.3-uc-v3-bpw5 live
Hermes-3-Llama-3.1-405B 🔥 HEADLINE 405B 1.0066× +0.66% 5.0358 → 5.0692, single 32 GB GPU hermes-3-llama-3.1-405b-uc-v3-bpw5 gated
Qwen3-0.6B 0.6B 1.0069× +0.69% qwen3-0.6b-uc-v3-bpw5 live
OLMo-2-0425-1B 1.0B 1.0073× +0.73% olmo-2-0425-1b-uc-v3-bpw5 live
OLMo-2-0425-1B-Instruct 1.0B 0.9998× −0.02% regularization observed olmo-2-0425-1b-instruct-uc-v3-bpw5 live
SmolLM2-1.7B-Instruct 1.7B 1.0075× +0.75% smollm2-1.7b-instruct-uc-v3-bpw5 live
SmolLM2-1.7B 1.7B 1.0085× +0.85% smollm2-1.7b-uc-v3-bpw5 live
Llama-3.1-8B 8.0B 1.0125× +1.25% baseline; in-band llama-3.1-8b-uc-v3-bpw5 live

Mean across 14 verified records: 1.00554×. Median: 1.00494×.

Reproduce any row in 3 commands

Pick a row above. Copy the HF artifact name. Run:

pip install ultracompress
hf download SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5 --local-dir ./pack
uc verify ./pack

For gated artifacts (10B+), click "Request access" on the HF page. Manual approval, usually within 24h. Free for sub-$1M ARR companies, individuals, research.

The verifier is the contract. uc verify reads the SHA-256 manifest and confirms every layer reconstructs to the bytes the trainer wrote. If a single byte drifts, it fails loudly. This is the difference between "lossless" as marketing and "lossless" as a contract you can audit.

Eval methodology

Honest negative results

We catalogue what doesn't work at the same level of detail as what does. A few examples:

Full catalog (15+ entries) at github.com/sipsalabs/ultracompress/blob/main/docs/HONEST_NEGATIVE_RESULTS_2026_05_08.md.

Want to use these via API?

Same model menu, OpenAI-compatible.

from openai import OpenAI
client = OpenAI(base_url="https://api.sipsalabs.com/v1", api_key="sk-...")
resp = client.chat.completions.create(
    model="hermes-3-405b",
    messages=[{"role": "user", "content": "test"}],
)
print(resp.choices[0].message.content)

$5 free credits on signup, no card. See /pricing for the full per-model token rates.

API status: Self-Serve API is in private beta. The endpoint goes public alongside the Mon launch. For early access: founder@sipsalabs.com.

Questions about a specific row?

Direct line to the founder. Single solo founder; you'll hear back within 4–8h US business hours.