ARCHIVED RESEARCH PRODUCT — UltraCompress is no longer Sipsa Labs' active commercial focus. We discontinued it honestly when the moat went commodity, and we keep this page public as a research record and proof of execution: 23 verified architectures, reproducible SHA-256 reconstruction, published methodology. Sipsa Labs now builds Sentio — embedded intelligence for machines.

Archived · final release v0.6.27 on PyPI · uc verify (structure + SHA-256 integrity)

SIPSA LABS / ULTRACOMPRESS

Run a 405B model on a single 32 GB GPU.

UltraCompress let you run frontier models on hardware you already own, with reproducible, cryptographically verifiable reconstruction — the model your audit reviewed is the model you ship. The mechanism is near-lossless 5-bit transformer compression (~1% perplexity): ~3× smaller weights (16-bit → ~5 bits/weight) at the same task quality. The research, benchmarks, and published packs remain available below as a record.

View the benchmark record Playground (archive)

pip install ultracompress (final release) Research record

/ Archived — the open substrate still works for researchers:

# Public substrate (no API key required):
pip install ultracompress
hf download SipsaLabs/qwen3-8b-uc-v3-bpw5 --local-dir ./qwen3-8b

/ Who this is for

Three ways in. As it shipped.

Same 5-bit substrate, three delivery modes during the commercial run — from a free pip install on your laptop to a managed OpenAI-compatible endpoint to a deployment inside your security boundary. Today only the free, public path remains active.

For developers

I want to run big models on my laptop.

"I have a 5090 / 4090 / Mac with 32 GB. Give me the weights."

Free public substrate. pip install ultracompress, hf download the artifact, run uc verify. No signup. No API key. The same SHA-256 manifest your security team would audit, on your hardware, today.

Install from PyPI →

For companies

I need OpenAI-compatible inference at lower cost.

"Swap the base URL. Cut the bill. Don't rewrite the app."

The managed API operated on this stack during the commercial run; it is no longer sold. The open substrate (PyPI + published packs) remains available for researchers.

Read the research record →

For enterprise

I need on-prem, air-gapped, or FedRAMP-ready.

"It has to run inside our boundary, under our audit log."

Reproducible reconstruction inside the customer's VPC, bare-metal cluster, or air-gapped enclave. SOC 2 / SR-11-7 / FDA / DoD-ready architecture. This path closed when the product was discontinued.

See the verified record →

/ Verified records

The numbers, with receipts.

Every record below has a public Hugging Face artifact and an SHA-256 manifest you can re-verify on your hardware. Perplexity ratio is measured against the bf16 baseline at seq_len=1024, FineWeb-edu held-out tail. Run uc verify and confirm the contract holds — no "trust me."

Hermes-3-Llama-3.1-405B

1.0066×

405B params · runs on a single 32 GB GPU via streaming — reconstructed per layer from disk, not VRAM-resident · +0.66% perplexity vs bf16 baseline

Mixtral-8x7B-v0.1 (MoE)

1.00368×

47B params (13B active) · +0.368% perplexity · mixture-of-experts

Qwen3-14B

1.00403×

14.0B params · +0.403% perplexity · scale-invariant codec

Qwen3-8B

1.00440×

8.0B params · +0.440% perplexity · 8B class record

Qwen3-1.7B-Base

1.0040×

1.7B params · +0.401% perplexity · tightest small-decoder record

Mistral-7B-v0.3

1.00548×

7.0B params · +0.548% perplexity · hardest architecture cracked to date

22 PPL-verified architectures · (17 dense + 4 MoE + 1 SSM) + 1 ViT cosine-verified · across 4 architecture classes See the full verified matrix →

/ Why customers care

What you actually get.

Four things made this different from every other "model compression" pitch. Each one remains verifiable on your hardware from the published artifacts — nothing to sign, nothing to pay.

Same OpenAI SDK. No rewrite.

Set OPENAI_BASE_URL=https://api.sipsalabs.com/v1 and your existing inference code keeps working. Chat, completions, embeddings — same surface area, same response shape. Drop-in replacement for the OpenAI client in Python, Node, Go, Rust.

SHA-256 reproducibility.

Every artifact ships with a per-file SHA-256 manifest. uc verify confirms the downloaded bytes match that manifest (pack structure + download integrity; it does not run the model or reproduce perplexity). The compressed artifact your audit reviewed in March is the artifact your endpoint serves in October. SR-11-7 and FDA SaMD reviews carry through — no "compressed-variant" governance lane.

Near-lossless quality, measured.

Task quality preserved, measured, published. Perplexity ratios from ~1.0013× to ~1.0125× against the bf16 baseline (1.0066× on the 405B flagship) — not "looks fine to me," not eyeball-tested on three prompts. Reproduce the eval on your hardware with one command. And when you need exactness over near-lossless, a separate genuinely lossless archival tier reconstructs the original weights bit-for-bit.

~3× lower memory footprint.

Fits on consumer GPUs you already own — or consolidates onto far fewer of the GPUs you already rent (one box where the bf16 weights needed several). Hermes-3-405B on a single RTX 5090. Mixtral-8x7B on a 4090. The cost-per-token math changes when the weights stop spilling into a second box.

/ Quick start

Three paths. One still runs.

The free path still runs end-to-end today — no signup, the substrate is public. The managed and enterprise paths closed when the product was discontinued; they stay below as a record of how it shipped.

01 Free — run it on your hardware No signup

Install the CLI, pull a published artifact, run the verifier and benchmark. Three commands. The full substrate, free to run, under BUSL-1.1 with the Additional Use Grant.

# 1. Install
pip install ultracompress

# 2. Pull an artifact (Hugging Face)
hf download SipsaLabs/qwen3-8b-uc-v3-bpw5 --local-dir ./qwen3-8b

# 3. Verify SHA-256 + run benchmark on your hardware
uc verify ./qwen3-8b

PyPI package →

02 Managed — OpenAI-compatible endpoint Archive

During the commercial run, the API served an OpenAI-compatible endpoint. The page below is kept as a record of what shipped.

# 1. Keys are no longer issued — purchases closed June 2026
export SIPSA_API_KEY=sk-...

# 2. Use the standard OpenAI SDK, just change the base URL
from openai import OpenAI

client = OpenAI(
    base_url="https://api.sipsalabs.com/v1",
    api_key=os.environ["SIPSA_API_KEY"],
)

resp = client.chat.completions.create(
    model="sipsa-qwen3-8b",
    messages=[{"role": "user", "content": "Hello"}],
)

View the benchmark record →

03 Enterprise — on-prem, air-gapped, custom Archive

During the commercial run, enterprise deployments ran inside the customer's security boundary — on-prem, VPC, or air-gapped, with SOC 2 / SR-11-7 / FDA / DoD-ready architecture. This path is closed; the intake template is kept as a record.

# Archived — the enterprise intake asked for:
#   Use case
#   Scale (GPU count, expected token throughput, models)
#   Security boundary (on-prem, VPC, air-gapped)
#   Timeline

/ Pricing — archived record

What was sold. Purchases closed.

UltraCompress sold as a self-serve subscription during its commercial run. The product was discontinued in June 2026 and purchases are closed — the tiers below are kept, with all purchase links removed, as a record of what shipped. The free substrate (PyPI + published packs) remains public.

Free

$0/mo

For evaluation and small workloads. Real inference, real models, real SDK.

Archived — packs remain public
OpenAI-compatible API access
All public 5-bit artifacts
Public verifier & benchmark CLI

Purchases closed

Pro

$20/mo

For individual developers running real workloads.

Generous monthly quota
Full catalog (except 405B-flagship)
Priority queue over Free traffic
Email support, 1-business-day reply

Purchases closed

Max

$100/mo

For production apps with real customer traffic. 5× or 20× the Pro quota.

5× Pro quota ($100) or 20× ($200)
405B-flagship access on Max 20×
Highest-priority queue + 99.5% SLA
Audit logs on Max 20×

Purchases closed

Team

$25/seat/mo

For engineering teams with central billing and admin. 5 seats minimum.

Max-5× quota per seat
Full catalog including 405B-flagship
SSO + admin controls + audit logs
5–150 seats, central billing

Purchases closed

Commercial availability ended June 2026. The substrate, packs, and verified benchmarks remain public — see the verified record →