Blog
UltraCompress engineering notes, lab discoveries, and shipping milestones. We tell you which numbers we have measured, when, with what conditions — and we do not tell you the others.
Sipsa Inference is live.
OpenAI-compatible inference API at api.sipsalabs.com/v1 serving 22 lossless 5-bit transformer architectures including Hermes-3-Llama-3.1-405B at 1.0066x PPL ratio. Drop-in replacement for the official openai SDK. First $5 of usage on us.
We searched for our competition. The 5-bit band was empty.
A live HuggingFace Hub query for lossless 5-bit transformer compression returned zero competing artifacts. Twenty-two architectures, three new sub-1.005x records this week, SHA-256 verifiable bit-identical reconstruction.
Hermes-3-405B compressed lossless at 5 bits.
Largest dense transformer artifact published to HuggingFace at 5-bit lossless. Compressed end-to-end on dual RTX 5090s in 13 hours of wall clock. 251 GB pack on disk. Verified perplexity ratio 1.0066x.
Eighteen architectures, one pack format.
UltraCompress 5-bit lossless transformer compression validated across 18 architectures from 0.6B to 405B parameters — dense, mixture-of-experts, and state-space (Mamba). One pack format. Bit-identical reconstruction.