Private Beta24h Turnaround

Request Sipsa Inference API access

An OpenAI-compatible inference API serving 22 lossless 5-bit transformer architectures (Hermes-3-Llama-3.1-405B, Mixtral-8x7B, Qwen3-14B, Mistral-7B, and 18 more) with SHA-256 verifiable bit-identical reconstruction. We're onboarding customers in batches to keep latency tight while we scale capacity.

Two paths, both available today

Path 1 — Self-hosted substrate (no API key required)

The compressed model substrate is fully production. Pull any of 40 customer-side reproducible artifacts from our HuggingFace org and run locally:

pip install ultracompress
hf download SipsaLabs/qwen3-8b-uc-v3-bpw5 --local-dir ./qwen3-8b
uc verify ./qwen3-8b   # confirms bit-identical reconstruction
uc bench ./qwen3-8b    # measures TTFT / tokens/sec / VRAM

Free for sub-$1M ARR companies, research, and individuals (BUSL-1.1 + Additional Use Grant). Auto-converts to Apache 2.0 four years after each release.

Path 2 — Managed API (private beta)

If you'd rather skip self-hosting, the managed inference API at api.sipsalabs.com/v1 is a drop-in OpenAI-SDK replacement:

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.sipsalabs.com/v1",
    api_key=os.environ["SIPSA_API_KEY"],
)

response = client.chat.completions.create(
    model="hermes-3-405b",
    messages=[{"role": "user", "content": "hello, lossless world"}],
)
print(response.choices[0].message.content)

Get a private-beta API key

Email founder@sipsalabs.com with a one-line description of your use case and we'll provision a key within 24 hours. First $5 of usage on every approved account is on us.

Click the button below — it pre-fills the email with a structured intake template. Edit and send.

Email founder@sipsalabs.com

What you'll get

Onboarding timeline

T+0   You send the intake email
T+24h  Sip reviews + provisions API key, replies with key + onboarding doc
T+25h  You set OPENAI_BASE_URL=https://api.sipsalabs.com/v1 + start sending requests
T+1wk  Optional 30-min onboarding call to walk through your specific use case

Why a private beta?

Two reasons:

1. Quality of service. We're a solo-founder operation running real GPU infrastructure. Onboarding in batches lets us monitor every customer's request pattern and tune capacity without dropping packets. As capacity scales, the beta opens.

2. Customer-conversation density. Every early-beta conversation tells us which architectures customers want compressed next, which pricing tier matters, which compliance feature is the deal-closer. We're using these conversations to inform the roadmap. Private beta = high-density customer learning.

The substrate (pip install ultracompress) is fully production today and unblocks any self-host customer with no waiting list. The API is the convenience surface — beta only because we're scaling capacity carefully.

Compliance & enterprise notes

For SOC 2 / SR-11-7 / FDA / DoD / HIPAA-bound deploys, the on-prem MSA path is available today (no waitlist) — see the pricing page or email founder@sipsalabs.com for the contract template. SHA-256 verifiable bit-identical reconstruction is the regulatory-equivalence floor that makes us the only 5-bit-class substrate viable for those workloads.

Read more