Lossless inference at half the incumbent price.

SHA-256-verifiable bit-identical reconstruction. 22 architectures from 1.7B to 405B, all served from a single 32 GB consumer GPU class. Pay-per-token via OpenAI-compatible API.

Five tiers

From individual researchers to compliance-bound enterprise. Pick the lane that fits.

TierWho it's forWhat you getPrice
Verifier Researchers, students, individuals, sub-$1M ARR companies The codec under BUSL-1.1 Additional Use Grant. Full verifier scripts. Public HF artifacts (sub-10B free; 10B+ via "Request access" gate). $0
Self-Serve API Indie devs, small teams, prototyping OpenAI-compatible endpoint. 5+ models. Pay-as-you-go per-token. Real-time usage dashboard. From $0.10 / M tok in
Pro Inference Indie founder / small CTO outgrown $5 credits, wants predictable bills Reserved GPU slice. No rate limits. Priority queue. Predictable monthly. SLA on uptime. From $499 / mo
Enterprise SaaS Mid-market ($1M–$50M ARR) running production AI Custom model menu. Dedicated capacity. Named SLA. Security review docs. Founder access. From $50,000 / yr
On-Prem Deploy Compliance-bound enterprise (gov / finance / healthcare / defense) Site license for the codec + serving runtime in your VPC. Install support, security review, SLA, named engineer. ITAR-friendly. From $250,000 / yr

Self-Serve API — per-model

OpenAI-compatible. Swap your base_url, your code keeps working.

ModelInput $/M tokOutput $/M tokvs incumbent
Phi-3.5-MoE$0.10$0.30parity
Qwen3-8B$0.15$0.60−17%
Qwen3-14B$0.20$0.80parity
Mixtral-8x7B$0.22$0.70−8%
Hermes-3-405B$2.50$2.50−44% vs Together
from openai import OpenAI
client = OpenAI(base_url="https://api.sipsalabs.com/v1", api_key="sk-...")
resp = client.chat.completions.create(
    model="hermes-3-405b",
    messages=[{"role": "user", "content": "test"}],
)
print(resp.choices[0].message.content)

Two specialized service offerings

Compression-as-a-Service

We compress YOUR fine-tuned model lossless. You keep the artifact, deploy anywhere.

Model sizeOne-timeRe-compress
Sub-7B$5,00050%
7–30B$15,00050%
30–100B$50,00050%
100B+from $150,00050%
  • 2-week turnaround
  • Includes verification manifest + SHA-256 audit
  • For customers who don't want to set up the codec themselves

Custom Architecture Support

We integrate your model architecture into the lossless codec pipeline.

Arch typePrice
Mainstream new arch (e.g., Llama-5 at release)$25,000
Research / academic arch$50,000
Proprietary internal arch (under MNDA)$150,000
  • 2-week guarantee for mainstream architectures from public release
  • Includes the lossless artifact + ability to compress future fine-tunes
  • The only vendor matching "new arch within 24 hr of HF release"
Coming Q3 2026 — Audit & Verification Service. Third-party SHA-256-attested certificates for any codec artifact (ours or another vendor's). For regulatory, compliance, customer-trust use cases. Per-artifact $99 (sub-14B) to $2,500 (frontier 100B+). Annual unlimited from $5,000.

Why this works

Three things make lossless 5-bit pricing meaningfully different:

FAQ

What is BUSL-1.1 and how does the Additional Use Grant work?

BUSL-1.1 (Business Source License) lets us release source code openly while preventing competitor commercial resale. Our Additional Use Grant explicitly permits free use for sub-$1M ARR companies, individuals, researchers, and educational institutions. Above $1M ARR shipping in production: see Enterprise SaaS or On-Prem tier. Every release auto-converts to Apache 2.0 four years after publication. Same pattern as Sentry, HashiCorp, MariaDB.

What does "lossless" actually mean here?

Bit-identical reconstruction of the dequantized weight tensor at fp32 storage, verified by SHA-256 manifest. The compressed pack reproduces, byte-for-byte, the same numerical values the trainer used during distillation. End-to-end inference behavior matches the bf16 baseline up to fp16 reduction-order on the matmul itself. PPL ratios across 22 architectures fall in the 1.0026× – 1.0200× band.

Is the codec algorithm public?

Codec internals + training procedure are patent-protected (USPTO 64/049,511 + 64/049,517 filed 2026-04-25). The verifier flow, binary format, and reconstruction contract are public. The recipe stays private; the results stay verifiable. This is intentional: customers run the verifier on their hardware and confirm the contract holds; they don't need the recipe to validate the result.

How does this compare to Together / Lambda / CoreWeave?

For Hermes-3-Llama-3.1-405B on Together: $4.50 per million tokens. We charge $2.50 — 44% under. Same model, OpenAI-compatible schema, bit-identical reconstruction we publish a SHA-256 manifest for. For smaller models we're roughly at parity to slightly under. We're not a Together replacement for single-stream sub-50ms TTFT; we're a batch-throughput discipline play with a verifier.

Can I run the API key in production today?

API is in private beta. Sub-10B models are public on HuggingFace for verifier-tier free use. 10B+ models are gated — request access via the HF model card, manual approval usually within 24 hr. Self-serve API endpoint goes public following the Mon HN launch.

What's the uptime guarantee?

Current production runs from a single home-lab dual-RTX-5090 box on Cloudflare Tunnel. Uptime is "measured in nines, not yet five of them." Pro Inference and Enterprise SaaS tiers include named SLA. Status page at sipsalabs.com/status.

How do I pay?

Self-Serve API: Stripe usage-based billing, credit card. Pro: monthly invoice (Stripe or wire). Enterprise: annual contract, wire transfer or PO. On-Prem: site license, wire on signature. All tiers include W-9 / W-8BEN as needed. founder@sipsalabs.com handles paperwork.

What about ITAR / export controls / data residency?

On-Prem tier is designed for these cases. We deliver the codec + serving runtime to run inside your VPC; no inference data leaves your environment. Security review documentation is part of the engagement. Defense / aerospace customers welcome — happy to walk you through. founder@sipsalabs.com.

Get started

Pick the path that fits.

Verifier (free) pip install ultracompress && uc verify SipsaLabs/qwen3-8b-uc-v3-bpw5
Self-Serve API founder@sipsalabs.com — $5 free credits, no card
Pro Inference founder@sipsalabs.com — 15-min onboarding call
Enterprise SaaS / On-Prem founder@sipsalabs.com — direct line to founder
Compression / Custom Arch founder@sipsalabs.com — scope + quote in 24 hr