Request Sipsa Inference API access
An OpenAI-compatible inference API serving 22 lossless 5-bit transformer architectures (Hermes-3-Llama-3.1-405B, Mixtral-8x7B, Qwen3-14B, Mistral-7B, and 18 more) with SHA-256 verifiable bit-identical reconstruction. We're onboarding customers in batches to keep latency tight while we scale capacity.
Two paths, both available today
Path 1 — Self-hosted substrate (no API key required)
The compressed model substrate is fully production. Pull any of 40 customer-side reproducible artifacts from our HuggingFace org and run locally:
pip install ultracompress hf download SipsaLabs/qwen3-8b-uc-v3-bpw5 --local-dir ./qwen3-8b uc verify ./qwen3-8b # confirms bit-identical reconstruction uc bench ./qwen3-8b # measures TTFT / tokens/sec / VRAM
Free for sub-$1M ARR companies, research, and individuals (BUSL-1.1 + Additional Use Grant). Auto-converts to Apache 2.0 four years after each release.
Path 2 — Managed API (private beta)
If you'd rather skip self-hosting, the managed inference API at api.sipsalabs.com/v1 is a drop-in OpenAI-SDK replacement:
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.sipsalabs.com/v1",
api_key=os.environ["SIPSA_API_KEY"],
)
response = client.chat.completions.create(
model="hermes-3-405b",
messages=[{"role": "user", "content": "hello, lossless world"}],
)
print(response.choices[0].message.content)
Get a private-beta API key
Email founder@sipsalabs.com with a one-line description of your use case and we'll provision a key within 24 hours. First $5 of usage on every approved account is on us.
Click the button below — it pre-fills the email with a structured intake template. Edit and send.
Email founder@sipsalabs.comWhat you'll get
- API key with $5 free credit on first usage
- Drop-in
openaiSDK compatibility — no code rewrite - Access to 22 architectures (Hermes-3-405B, Mixtral-8x7B, Qwen3-14B, Mistral-7B, Phi-3-mini, plus the full 22-arch matrix as we tier-promote them)
- Verifiable bit-identical model output: SHA-256 manifest available per model on demand
- Direct line to founder for technical questions: founder@sipsalabs.com
- Optional Phase 0 POC engagement ($5K-$25K, 1 week, deliverable + invoice — your money back if we miss spec)
Onboarding timeline
OPENAI_BASE_URL=https://api.sipsalabs.com/v1 + start sending requestsWhy a private beta?
Two reasons:
1. Quality of service. We're a solo-founder operation running real GPU infrastructure. Onboarding in batches lets us monitor every customer's request pattern and tune capacity without dropping packets. As capacity scales, the beta opens.
2. Customer-conversation density. Every early-beta conversation tells us which architectures customers want compressed next, which pricing tier matters, which compliance feature is the deal-closer. We're using these conversations to inform the roadmap. Private beta = high-density customer learning.
The substrate (pip install ultracompress) is fully production today and unblocks any self-host customer with no waiting list. The API is the convenience surface — beta only because we're scaling capacity carefully.
Compliance & enterprise notes
For SOC 2 / SR-11-7 / FDA / DoD / HIPAA-bound deploys, the on-prem MSA path is available today (no waitlist) — see the pricing page or email founder@sipsalabs.com for the contract template. SHA-256 verifiable bit-identical reconstruction is the regulatory-equivalence floor that makes us the only 5-bit-class substrate viable for those workloads.