Why we filed comments on FDA-2026-N-4390

Sipsa Labs is a small, pre-revenue, deep-tech company. We are also the kind of company that writes formal public comments to the U.S. Food and Drug Administration. This post explains why those two facts are not in tension — and what we argued the early-phase clinical trial AI pilot docket should do.

Sipsa Labs · 2026-05-27 · Posted by the Sipsa Labs team

FDA-2026-N-4390

Docket we filed on

May 29

Comment period closes

PPL-verified architectures cited as existence proof

Why a deep-tech company writes to FDA

FDA reads its docket. Public comments under the Administrative Procedure Act are how the agency formally weighs an outside view before it builds a Pilot Program. The harder reason: the AI inference layer in regulated industries is at a one-time inflection point. A deployment-integrity primitive that gets designed in once at the pilot stage is inherited downstream by every Predetermined Change Control Plan and every Total Product Lifecycle framework. Omitted now, it has to be retrofitted later at meaningfully higher cost. The early-phase trial pilot is the design-in moment.

The integrity gap the pilot can close

The regulatory framework FDA has built around AI/ML-enabled Software as a Medical Device — Good Machine Learning Practice, the December 2024 PCCP final guidance, the January 2025 TPLC draft — spends substantial and appropriate attention on model validity. We make no novel argument about that property. The property we argue has received much less attention is deployment integrity: is the model actually executing during the trial — on the sponsor's infrastructure, in the CRO's analytics environment, in a cloud inference service — the same artifact, in the precise numerical sense, that the validation evidence describes?

For decades the question was operationally trivial: the validated model was a set of regression coefficients in a file, and the deployed model was that same file. Large neural-net deployment broke the simplicity. Production inference today involves model conversion, quantization for accelerator economics, kernel-specific weight packing, and just-in-time recomputation on the target accelerator. In the dominant practice, none of those steps are bit-preserving: the weights the GPU consumes are not, byte-for-byte, the weights stored in the validated artifact. The mechanism is well-documented — floating-point non-associativity, kernel-version differences, atomic reductions — with Chen et al. (arXiv:2408.05148, 2024) the canonical reference.

The operational consequence: when an adverse event occurs and root-cause work must distinguish a model-design failure from a deployment-pipeline failure from a runtime numerical artifact, the absence of a deployment-integrity attestation makes that investigation materially harder. Our comment argues the Pilot Program should treat deployment integrity as a first-class design requirement, on equal footing with model validity.

What we recommended — a category, not a vendor

The technical category we recommended is audit-grade cryptographic reconstruction. The defining property is a contract: validated artifact in, deterministic reconstruction of that same validated artifact out, with a cryptographic fingerprint (SHA-256, FIPS PUB 180-4) that survives reload, hardware change, and kernel-version change, and that an auditor can verify in one cryptographic check.

The category is encoder-agnostic. Several technical approaches can satisfy it: full lossless coding of full-precision weights; quantized representations with published deterministic reconstruction proofs and per-tensor hash manifests; sealed-enclave attestations to loaded weight tensors. The right regulatory move is to specify the property and let multiple implementations compete to satisfy it, not to prescribe a specific mechanism.

Sipsa Labs is an early implementer of one mechanism within this category. We disclose this in the letter and on this page because it would be dishonest not to. Our public catalog at huggingface.co/SipsaLabs contains 22 PPL-verified architectures (17 dense transformer plus 4 mixture-of-experts transformer plus one state-space model, 0.6B to 405B parameters) plus 1 ViT cosine-verified (DINOv2-Large) — 23 architectures across 4 classes — with reconstructed weights at maximum absolute difference zero in 32-bit float against a stored reference and per-tensor SHA-256 manifests. The SSM record carries an explicit comparator-note caveat — the canonical transformer comparator is architecture-incompatible with state-space models — documented inline in the public registry. The catalog is existence proof the category is feasible at production scale, not evidence our implementation is the only viable one.

Two honesty constraints in the letter

The reconstruction guarantee is against a stored reference, not against full precision. The stored reference itself carries a small, published perplexity ratio against a full-precision baseline — 1.0066× for the 405B Hermes-3 pack, 1.00506× for Phi-4. A pilot participant validates the qualified artifact end-to-end as the regulated artifact, and from that point on the cryptographic fingerprint is the bridge to production. You qualify the exact artifact you deploy; you do not ask the validator to qualify a different artifact than what serves traffic. The letter does not let that distinction blur.

Bit-identical weights are necessary but not sufficient for fully reproducible inference behavior. Token-level outputs can vary with kernel version, reduction order, sampling parameters, and accelerator generation, even when weight tensors are bit-identical. The Pilot Program should treat weight identity as the cryptographic floor and behavioral identity as a configuration-management problem layered on top: pin the kernel, the driver, the accelerator generation, the sampling configuration. The shape is identical to how adjacent regulatory frameworks have handled cryptographic-versus-procedural controls for two decades.

What we asked the Pilot Program to do

Five recommendations, each implementable within the Pilot's likely scope.

Require a deployment-integrity attestation for every participating AI component — at minimum, a per-tensor SHA-256 manifest of the weights as consumed by the inference runtime, independently reproducible.
Define the attestation in terms of properties, not mechanisms. Specify the property; remain agnostic to the compression, encoding, or sealing mechanism that achieves it.
Require continuous verification during the trial, not only at submission. A manifest mismatch should be a defined adverse event with a predefined response.
Make the attestation a citable component of the PCCP Modification Protocol so future updates are validated using the same property-based specification.
Publish Pilot Program findings on the feasibility of audit-grade cryptographic reconstruction at clinical-trial scale, including observed barriers, costs, and limitations.

None of those recommendations name Sipsa Labs. None prescribe our codec. We argued the category. The rest is the agency's decision.

If you work in regulated AI deployment

The comment period closes May 29, 2026. If you work in clinical trial sponsorship, contract research, AI/ML-enabled SaMD, or regulated cloud inference and have a perspective on deployment integrity the agency should hear, the Federal eRulemaking Portal at regulations.gov is the venue — search FDA-2026-N-4390. Our filed comment is on the docket under our name.

Sipsa Labs is an experimental and deep tech-and-software company. UltraCompress is the first publicly-shipped product. Sipsa Inference is the second. More products in flight. Near-lossless 5-bit packs are released under BUSL-1.1 + Additional Use Grant, free for individuals, research, and commercial use under $1M ARR.