Private Beta24h Turnaround

Request Sipsa Inference API access

An OpenAI-compatible inference API serving 22 lossless 5-bit transformer architectures (Hermes-3-Llama-3.1-405B, Mixtral-8x7B, Qwen3-14B, Mistral-7B, and 18 more) with SHA-256 verifiable bit-identical reconstruction. We're onboarding customers in batches to keep latency tight while we scale capacity.

Two paths, both available today

Path 1 — Self-hosted substrate (no API key required)

The compressed model substrate is fully production. Pull any of 40 customer-side reproducible artifacts from our HuggingFace org and run locally:

pip install ultracompress
hf download SipsaLabs/qwen3-8b-uc-v3-bpw5 --local-dir ./qwen3-8b
uc verify ./qwen3-8b   # confirms bit-identical reconstruction
uc bench ./qwen3-8b    # measures TTFT / tokens/sec / VRAM

Free for sub-$1M ARR companies, research, and individuals (BUSL-1.1 + Additional Use Grant). Auto-converts to Apache 2.0 four years after each release.

Path 2 — Managed API (private beta)

If you'd rather skip self-hosting, the managed inference API at api.sipsalabs.com/v1 is a drop-in OpenAI-SDK replacement:

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.sipsalabs.com/v1",
    api_key=os.environ["SIPSA_API_KEY"],
)

response = client.chat.completions.create(
    model="hermes-3-405b",
    messages=[{"role": "user", "content": "hello, lossless world"}],
)
print(response.choices[0].message.content)

Get a private-beta API key

Email founder@sipsalabs.com with a one-line description of your use case and we'll provision a key within 24 hours. First $5 of usage on every approved account is on us.

Click the button below — it pre-fills the email with a structured intake template. Edit and send.

Email founder@sipsalabs.com

What you'll get

API key with $5 free credit on first usage
Drop-in openai SDK compatibility — no code rewrite
Access to 22 architectures (Hermes-3-405B, Mixtral-8x7B, Qwen3-14B, Mistral-7B, Phi-3-mini, plus the full 22-arch matrix as we tier-promote them)
Verifiable bit-identical model output: SHA-256 manifest available per model on demand
Direct line to founder for technical questions: founder@sipsalabs.com
Optional Phase 0 POC engagement ($5K-$25K, 1 week, deliverable + invoice — your money back if we miss spec)

Onboarding timeline

T+0 You send the intake email

T+24h Sip reviews + provisions API key, replies with key + onboarding doc

T+25h You set OPENAI_BASE_URL=https://api.sipsalabs.com/v1 + start sending requests

T+1wk Optional 30-min onboarding call to walk through your specific use case

Why a private beta?

Two reasons:

1. Quality of service. We're a solo-founder operation running real GPU infrastructure. Onboarding in batches lets us monitor every customer's request pattern and tune capacity without dropping packets. As capacity scales, the beta opens.

2. Customer-conversation density. Every early-beta conversation tells us which architectures customers want compressed next, which pricing tier matters, which compliance feature is the deal-closer. We're using these conversations to inform the roadmap. Private beta = high-density customer learning.

The substrate (pip install ultracompress) is fully production today and unblocks any self-host customer with no waiting list. The API is the convenience surface — beta only because we're scaling capacity carefully.

Compliance & enterprise notes

For SOC 2 / SR-11-7 / FDA / DoD / HIPAA-bound deploys, the on-prem MSA path is available today (no waitlist) — see the pricing page or email founder@sipsalabs.com for the contract template. SHA-256 verifiable bit-identical reconstruction is the regulatory-equivalence floor that makes us the only 5-bit-class substrate viable for those workloads.