Lossless inference at half the incumbent price.
SHA-256-verifiable bit-identical reconstruction. 22 architectures from 1.7B to 405B, all served from a single 32 GB consumer GPU class. Pay-per-token via OpenAI-compatible API.
Five tiers
From individual researchers to compliance-bound enterprise. Pick the lane that fits.
| Tier | Who it's for | What you get | Price |
|---|---|---|---|
| Verifier | Researchers, students, individuals, sub-$1M ARR companies | The codec under BUSL-1.1 Additional Use Grant. Full verifier scripts. Public HF artifacts (sub-10B free; 10B+ via "Request access" gate). | $0 |
| Self-Serve API | Indie devs, small teams, prototyping | OpenAI-compatible endpoint. 5+ models. Pay-as-you-go per-token. Real-time usage dashboard. | From $0.10 / M tok in |
| Pro Inference | Indie founder / small CTO outgrown $5 credits, wants predictable bills | Reserved GPU slice. No rate limits. Priority queue. Predictable monthly. SLA on uptime. | From $499 / mo |
| Enterprise SaaS | Mid-market ($1M–$50M ARR) running production AI | Custom model menu. Dedicated capacity. Named SLA. Security review docs. Founder access. | From $50,000 / yr |
| On-Prem Deploy | Compliance-bound enterprise (gov / finance / healthcare / defense) | Site license for the codec + serving runtime in your VPC. Install support, security review, SLA, named engineer. ITAR-friendly. | From $250,000 / yr |
Self-Serve API — per-model
OpenAI-compatible. Swap your base_url, your code keeps working.
| Model | Input $/M tok | Output $/M tok | vs incumbent |
|---|---|---|---|
| Phi-3.5-MoE | $0.10 | $0.30 | parity |
| Qwen3-8B | $0.15 | $0.60 | −17% |
| Qwen3-14B | $0.20 | $0.80 | parity |
| Mixtral-8x7B | $0.22 | $0.70 | −8% |
| Hermes-3-405B | $2.50 | $2.50 | −44% vs Together |
from openai import OpenAI
client = OpenAI(base_url="https://api.sipsalabs.com/v1", api_key="sk-...")
resp = client.chat.completions.create(
model="hermes-3-405b",
messages=[{"role": "user", "content": "test"}],
)
print(resp.choices[0].message.content)
- $5 in credits on signup, no card required
- Token billing on every request, real-time dashboard
- Same model menu as
huggingface.co/SipsaLabs, served bit-identical to the original HF weights - BUSL-1.1 Additional Use Grant covers self-host for sub-$1M ARR companies
Two specialized service offerings
Compression-as-a-Service
We compress YOUR fine-tuned model lossless. You keep the artifact, deploy anywhere.
| Model size | One-time | Re-compress |
|---|---|---|
| Sub-7B | $5,000 | 50% |
| 7–30B | $15,000 | 50% |
| 30–100B | $50,000 | 50% |
| 100B+ | from $150,000 | 50% |
- 2-week turnaround
- Includes verification manifest + SHA-256 audit
- For customers who don't want to set up the codec themselves
Custom Architecture Support
We integrate your model architecture into the lossless codec pipeline.
| Arch type | Price |
|---|---|
| Mainstream new arch (e.g., Llama-5 at release) | $25,000 |
| Research / academic arch | $50,000 |
| Proprietary internal arch (under MNDA) | $150,000 |
- 2-week guarantee for mainstream architectures from public release
- Includes the lossless artifact + ability to compress future fine-tunes
- The only vendor matching "new arch within 24 hr of HF release"
Why this works
Three things make lossless 5-bit pricing meaningfully different:
- SHA-256 verifiable. Every artifact ships with a manifest. Run
uc verify ./packon your hardware — if a single byte drifts, it fails loudly. - 22 architectures verified end-to-end. Mistral-7B-v0.3 just landed at 1.0055× PPL ratio (tonight's record). Hermes-3-405B at 1.0066×. Mamba-2.8B SSM at 1.0119×. Full table at huggingface.co/SipsaLabs.
- Single 32 GB GPU. Frontier models (405B) compressed and served on consumer hardware. Compute cost passed through as price discipline, not margin grab.
FAQ
What is BUSL-1.1 and how does the Additional Use Grant work?
BUSL-1.1 (Business Source License) lets us release source code openly while preventing competitor commercial resale. Our Additional Use Grant explicitly permits free use for sub-$1M ARR companies, individuals, researchers, and educational institutions. Above $1M ARR shipping in production: see Enterprise SaaS or On-Prem tier. Every release auto-converts to Apache 2.0 four years after publication. Same pattern as Sentry, HashiCorp, MariaDB.
What does "lossless" actually mean here?
Bit-identical reconstruction of the dequantized weight tensor at fp32 storage, verified by SHA-256 manifest. The compressed pack reproduces, byte-for-byte, the same numerical values the trainer used during distillation. End-to-end inference behavior matches the bf16 baseline up to fp16 reduction-order on the matmul itself. PPL ratios across 22 architectures fall in the 1.0026× – 1.0200× band.
Is the codec algorithm public?
Codec internals + training procedure are patent-protected (USPTO 64/049,511 + 64/049,517 filed 2026-04-25). The verifier flow, binary format, and reconstruction contract are public. The recipe stays private; the results stay verifiable. This is intentional: customers run the verifier on their hardware and confirm the contract holds; they don't need the recipe to validate the result.
How does this compare to Together / Lambda / CoreWeave?
For Hermes-3-Llama-3.1-405B on Together: $4.50 per million tokens. We charge $2.50 — 44% under. Same model, OpenAI-compatible schema, bit-identical reconstruction we publish a SHA-256 manifest for. For smaller models we're roughly at parity to slightly under. We're not a Together replacement for single-stream sub-50ms TTFT; we're a batch-throughput discipline play with a verifier.
Can I run the API key in production today?
API is in private beta. Sub-10B models are public on HuggingFace for verifier-tier free use. 10B+ models are gated — request access via the HF model card, manual approval usually within 24 hr. Self-serve API endpoint goes public following the Mon HN launch.
What's the uptime guarantee?
Current production runs from a single home-lab dual-RTX-5090 box on Cloudflare Tunnel. Uptime is "measured in nines, not yet five of them." Pro Inference and Enterprise SaaS tiers include named SLA. Status page at sipsalabs.com/status.
How do I pay?
Self-Serve API: Stripe usage-based billing, credit card. Pro: monthly invoice (Stripe or wire). Enterprise: annual contract, wire transfer or PO. On-Prem: site license, wire on signature. All tiers include W-9 / W-8BEN as needed. founder@sipsalabs.com handles paperwork.
What about ITAR / export controls / data residency?
On-Prem tier is designed for these cases. We deliver the codec + serving runtime to run inside your VPC; no inference data leaves your environment. Security review documentation is part of the engagement. Defense / aerospace customers welcome — happy to walk you through. founder@sipsalabs.com.
Get started
Pick the path that fits.
| Verifier (free) | pip install ultracompress && uc verify SipsaLabs/qwen3-8b-uc-v3-bpw5 |
| Self-Serve API | founder@sipsalabs.com — $5 free credits, no card |
| Pro Inference | founder@sipsalabs.com — 15-min onboarding call |
| Enterprise SaaS / On-Prem | founder@sipsalabs.com — direct line to founder |
| Compression / Custom Arch | founder@sipsalabs.com — scope + quote in 24 hr |