The first near-lossless 5-bit state-space model — Mamba-2.8B at 1.00593× canonical PPL

Mamba-2.8B-hf joined the public registry this week as the 20th PPL-verified pack and the first state-space model we have packed end-to-end with a canonical perplexity measurement. The headline number is 1.00593× against an architecture-compatible bf16 reference, with a comparator caveat we are documenting in plain sight and writing a separate methodology paper about.

Sipsa Inference · 2026-05-29 · Posted by the Sipsa Labs team

20
PPL-verified registry entries
1
State-space model
1.00593×
Mamba-2.8B canonical PPL ratio
29
Published negative results

Why this is meaningful, and why the meaning is qualified

Near-lossless 5-bit compression has, until this week, been an exclusively transformer story for us — 21 architectures (17 dense + 4 MoE), the same evaluation harness and canonical comparator, every reconstruction byte-for-byte against a per-tensor SHA-256 manifest. State-space models have been a different research project. The codec is architecture-agnostic: it operates on Linear weight tensors and a state-space block has them. The Mamba-2.8B-hf pack reconstructs reproducibly against the published manifest and the verifier returns the same STRUCTURE OK it returns for every transformer pack. What is different is the comparator.

The comparator caveat, in plain sight

Every PPL measurement has two things: a number and a comparator. The comparator is the bf16 reference pipeline against which the ratio is computed — same prompts, sequence length, evaluation harness, masking and positional-encoding semantics. For our 21 transformer rows (17 dense + 4 MoE) we use a canonical comparator with attention masks and rotary positional encodings handled the way the architecture expects. That comparator is architecture-incompatible with state-space models: Mamba blocks have neither attention masks nor rotary positional encodings. The teacher pipeline that handles a transformer DecoderLayer correctly cannot be pointed at a Mamba block and produce a fair number.

The Mamba-2.8B row is therefore measured against an architecture-compatible comparator. The number is 1.00593×, the reconstruction is reproducible, the verifier passes — but the number is not directly comparable to the 21 transformer rows because the comparator is not the same. The registry documents this inline; we documented it in the FDA comment letter this week; we are documenting it here.

A clean way to read this: the SSM row is the 22nd PPL-verified entry by count, but the apples-to-apples transformer summary statistics (median 1.00548, mean 1.00566, max 1.01250) exclude it. The SSM row is for the architecture-generalization story, not the transformer-comparison story. (The 23rd architecture in the catalog is DINOv2-Large, a Vision Transformer measured under a cosine-similarity comparator rather than PPL.)

Why we are publishing anyway

SSM compression has not been characterized at this rigor. We are not aware of a published near-lossless 5-bit canonical-PPL result on an SSM that is reproducibly reconstructible against a per-tensor SHA-256 manifest, on a public catalog with a customer-runnable verifier. If such a result exists, we would like to read it; if it does not, the Mamba-2.8B row is the first one we know of, and naming it explicitly is the right move.

The architecture-generalization signal is itself a contribution. The codec was designed against transformer Linear shapes; the prediction going in was that an SSM block would behave like a transformer Linear at the codec level. The prediction held. SSM near-lossless 5-bit nonetheless needs its own methodology paper: an architecture-compatible comparator stated rigorously, an evaluation harness that handles the recurrence correctly. We are writing it for EMNLP 2026 Industry Track.

Where the row sits in the registry

The 21 transformer rows (17 dense + 4 MoE) span 0.6B (Qwen3-0.6B) to 405B (Hermes-3-Llama-3.1-405B), ratios from 1.00129× (Phi-3.5-MoE) to 1.01250× (Llama-3.1-8B). Transformer summary: median 1.00548, mean 1.00566, max 1.01250. The full table is at github.com/sipsalabs. The Mamba-2.8B row sits at 1.00593× but is excluded from that summary because its comparator is different. Every row, including this one, carries the same uc verify round-trip: reconstructed weights byte-for-byte identical to the per-tensor SHA-256 manifest the pack ships with.

Three non-claims

We do not beat 4-bit on bpw. AWQ-int4, GPTQ-int4, EXL3, and QTIP all sit at lower bpw than 5, and several beat us on raw PPL on individual models — AWQ-int4 wins on Llama-3.1-8B. The differentiator we work on is the verifiability axis — the reproducible reconstruction contract no published 4-bit kernel currently provides — not the bpw axis.

One SSM is not the SSM family. Mamba-2.8B is one architecture; RWKV is a different family; Jamba is a hybrid. The signal here is that the codec's Linear-level operations are sound on a non-transformer architecture, not that every post-transformer architecture compresses the same way. The methodology paper will say so.

Vision encoders, multimodal projection heads, and speech models are not next. Not yet. The codec operates on Linear weight tensors and several of those modalities use weight shapes we have not characterized. That work is on the roadmap and not shipping.


Sipsa Labs is an experimental and deep tech-and-software company. UltraCompress is the first publicly-shipped product. Sipsa Inference is the second. More products in flight. Near-lossless 5-bit packs are released under BUSL-1.1 + Additional Use Grant, free for individuals, research, and commercial use under $1M ARR.