UC vs AWQ vs GPTQ vs EXL3. Side by side.
Every UltraCompress row is SHA-256-verified and independently reproducible via pip install ultracompress && uc verify. Competitor numbers are pulled from published HuggingFace model cards and peer-reviewed papers. Sorted by PPL ratio ascending — lower is more faithful to the original model.
| # ▲ | Model ▲ | Params ▲ | Method ▲ | BPW ▲ | PPL Ratio ▲ | Drift ▲ | Verified | Source |
|---|
Methodology & sources.
UltraCompress numbers are pulled directly from the public auditor registry (docs/benchmarks.json). Every row is independently verifiable: pip install ultracompress && uc verify <hf_pack>.
UC eval protocol:
- n=30 prompts (50 for some large models), seq_len=1024, seed=42
- FineWeb-edu held-out tail corpus
- Single 32 GB consumer GPU (RTX 5090)
- Per-layer streaming reconstruction (transformer); architecture-matched comparator (SSM)
Competitor sources:
AWQ, GPTQ, and EXL3 numbers are taken from published HuggingFace model cards (TheBloke, turboderp, Qwen team) and the original method papers. Where multiple published results exist for the same architecture at similar bit-widths, we use the best published number. All sources are linked per-row.
Source links per competitor method:
| Method | Primary source | Bit-width | Verification primitive |
|---|---|---|---|
| UC | docs/benchmarks.json | 5 bpw | SHA-256 manifest + uc verify |
| AWQ | Lin et al. 2023 (arxiv 2306.00978) | 4 bpw (g128) | None (no manifest, no verification primitive) |
| GPTQ | Frantar et al. 2022 (arxiv 2210.17323) | 4 bpw (g128) | None (no manifest, no verification primitive) |
| EXL3 | turboderp-org/exllamav3 | ~4-5 bpw (trellis) | None (no manifest, no verification primitive) |
Run your own benchmark.
Every UC row is independently verifiable. Install, verify, benchmark. If your numbers disagree, that's a bug report we want.