Lossless inference at half the incumbent price.

SHA-256-verifiable bit-identical reconstruction. 22 architectures from 1.7B to 405B, all served from a single 32 GB consumer GPU class. Pay-per-token via OpenAI-compatible API.

Get a key — $5 free credits Talk to sales

Five tiers

From individual researchers to compliance-bound enterprise. Pick the lane that fits.

Tier	Who it's for	What you get	Price
Verifier	Researchers, students, individuals, sub-$1M ARR companies	The codec under BUSL-1.1 Additional Use Grant. Full verifier scripts. Public HF artifacts (sub-10B free; 10B+ via "Request access" gate).	$0
Self-Serve API	Indie devs, small teams, prototyping	OpenAI-compatible endpoint. 5+ models. Pay-as-you-go per-token. Real-time usage dashboard.	From $0.10 / M tok in
Pro Inference	Indie founder / small CTO outgrown $5 credits, wants predictable bills	Reserved GPU slice. No rate limits. Priority queue. Predictable monthly. SLA on uptime.	From $499 / mo
Enterprise SaaS	Mid-market ($1M–$50M ARR) running production AI	Custom model menu. Dedicated capacity. Named SLA. Security review docs. Founder access.	From $50,000 / yr
On-Prem Deploy	Compliance-bound enterprise (gov / finance / healthcare / defense)	Site license for the codec + serving runtime in your VPC. Install support, security review, SLA, named engineer. ITAR-friendly.	From $250,000 / yr

Self-Serve API — per-model

OpenAI-compatible. Swap your base_url, your code keeps working.

Model	Input $/M tok	Output $/M tok	vs incumbent
Phi-3.5-MoE	$0.10	$0.30	parity
Qwen3-8B	$0.15	$0.60	−17%
Qwen3-14B	$0.20	$0.80	parity
Mixtral-8x7B	$0.22	$0.70	−8%
Hermes-3-405B	$2.50	$2.50	−44% vs Together

from openai import OpenAI
client = OpenAI(base_url="https://api.sipsalabs.com/v1", api_key="sk-...")
resp = client.chat.completions.create(
    model="hermes-3-405b",
    messages=[{"role": "user", "content": "test"}],
)
print(resp.choices[0].message.content)

$5 in credits on signup, no card required
Token billing on every request, real-time dashboard
Same model menu as huggingface.co/SipsaLabs, served bit-identical to the original HF weights
BUSL-1.1 Additional Use Grant covers self-host for sub-$1M ARR companies

Two specialized service offerings

Compression-as-a-Service

We compress YOUR fine-tuned model lossless. You keep the artifact, deploy anywhere.

Model size	One-time	Re-compress
Sub-7B	$5,000	50%
7–30B	$15,000	50%
30–100B	$50,000	50%
100B+	from $150,000	50%

2-week turnaround
Includes verification manifest + SHA-256 audit
For customers who don't want to set up the codec themselves

Custom Architecture Support

We integrate your model architecture into the lossless codec pipeline.

Arch type	Price
Mainstream new arch (e.g., Llama-5 at release)	$25,000
Research / academic arch	$50,000
Proprietary internal arch (under MNDA)	$150,000

2-week guarantee for mainstream architectures from public release
Includes the lossless artifact + ability to compress future fine-tunes
The only vendor matching "new arch within 24 hr of HF release"

Coming Q3 2026 — Audit & Verification Service. Third-party SHA-256-attested certificates for any codec artifact (ours or another vendor's). For regulatory, compliance, customer-trust use cases. Per-artifact $99 (sub-14B) to $2,500 (frontier 100B+). Annual unlimited from $5,000.

Why this works

Three things make lossless 5-bit pricing meaningfully different:

SHA-256 verifiable. Every artifact ships with a manifest. Run uc verify ./pack on your hardware — if a single byte drifts, it fails loudly.
22 architectures verified end-to-end. Mistral-7B-v0.3 just landed at 1.0055× PPL ratio (tonight's record). Hermes-3-405B at 1.0066×. Mamba-2.8B SSM at 1.0119×. Full table at huggingface.co/SipsaLabs.
Single 32 GB GPU. Frontier models (405B) compressed and served on consumer hardware. Compute cost passed through as price discipline, not margin grab.

FAQ

What is BUSL-1.1 and how does the Additional Use Grant work?

BUSL-1.1 (Business Source License) lets us release source code openly while preventing competitor commercial resale. Our Additional Use Grant explicitly permits free use for sub-$1M ARR companies, individuals, researchers, and educational institutions. Above $1M ARR shipping in production: see Enterprise SaaS or On-Prem tier. Every release auto-converts to Apache 2.0 four years after publication. Same pattern as Sentry, HashiCorp, MariaDB.

What does "lossless" actually mean here?

Bit-identical reconstruction of the dequantized weight tensor at fp32 storage, verified by SHA-256 manifest. The compressed pack reproduces, byte-for-byte, the same numerical values the trainer used during distillation. End-to-end inference behavior matches the bf16 baseline up to fp16 reduction-order on the matmul itself. PPL ratios across 22 architectures fall in the 1.0026× – 1.0200× band.

Is the codec algorithm public?

Codec internals + training procedure are patent-protected (USPTO 64/049,511 + 64/049,517 filed 2026-04-25). The verifier flow, binary format, and reconstruction contract are public. The recipe stays private; the results stay verifiable. This is intentional: customers run the verifier on their hardware and confirm the contract holds; they don't need the recipe to validate the result.

How does this compare to Together / Lambda / CoreWeave?

For Hermes-3-Llama-3.1-405B on Together: $4.50 per million tokens. We charge $2.50 — 44% under. Same model, OpenAI-compatible schema, bit-identical reconstruction we publish a SHA-256 manifest for. For smaller models we're roughly at parity to slightly under. We're not a Together replacement for single-stream sub-50ms TTFT; we're a batch-throughput discipline play with a verifier.

Can I run the API key in production today?

API is in private beta. Sub-10B models are public on HuggingFace for verifier-tier free use. 10B+ models are gated — request access via the HF model card, manual approval usually within 24 hr. Self-serve API endpoint goes public following the Mon HN launch.

What's the uptime guarantee?

Current production runs from a single home-lab dual-RTX-5090 box on Cloudflare Tunnel. Uptime is "measured in nines, not yet five of them." Pro Inference and Enterprise SaaS tiers include named SLA. Status page at sipsalabs.com/status.

How do I pay?

Self-Serve API: Stripe usage-based billing, credit card. Pro: monthly invoice (Stripe or wire). Enterprise: annual contract, wire transfer or PO. On-Prem: site license, wire on signature. All tiers include W-9 / W-8BEN as needed. founder@sipsalabs.com handles paperwork.

What about ITAR / export controls / data residency?

On-Prem tier is designed for these cases. We deliver the codec + serving runtime to run inside your VPC; no inference data leaves your environment. Security review documentation is part of the engagement. Defense / aerospace customers welcome — happy to walk you through. founder@sipsalabs.com.

Get started

Pick the path that fits.

Verifier (free)	`pip install ultracompress && uc verify SipsaLabs/qwen3-8b-uc-v3-bpw5`
Self-Serve API	founder@sipsalabs.com — $5 free credits, no card
Pro Inference	founder@sipsalabs.com — 15-min onboarding call
Enterprise SaaS / On-Prem	founder@sipsalabs.com — direct line to founder
Compression / Custom Arch	founder@sipsalabs.com — scope + quote in 24 hr