UltraCompress
5-bit transformer compression at near-lossless quality + OpenAI-compatible inference
~3× smaller weights (16-bit → ~5 bits/weight) at the same task quality, with reproducible reconstruction you can verify on your own hardware. Headline record: Hermes-3-Llama-3.1-405B at 1.0066× perplexity ratio on a single 32 GB consumer GPU. Same OpenAI SDK, just change the base URL. Need zero loss? A separate genuinely lossless archival tier reconstructs the original weights bit-for-bit.