DoD ATO and the deployed-equals-accredited check: what compression breaks and how to fix it

Defense Authority-to-Operate processes assume the AI model on the edge platform is the model the accreditation board reviewed. Compression breaks that link. Cryptographic reproducible reconstruction restores it.

Sipsa Inference · 2026-05-27 · Posted by the Sipsa Labs team

SI-7

DoD RMF control framework

ATO

Authority to Operate

1-line

deployed-equals-accredited check

architectures · 22 PPL + 1 ViT cosine

When a defense AI system goes through the Authority-to-Operate process under the DoD Risk Management Framework, the accreditation board reviews a specific, frozen-in-time artifact. The neural-network weights — every byte of them — get fingerprinted into the security control evidence package. The accreditation decision is, in effect, a decision about THAT artifact. Not the model conceptually. Not the model "as trained." The specific bytes.

Then those bytes get deployed to an edge platform. A drone, a soldier's helmet, a forward-deployed ground vehicle, a perimeter sensor. The platform has size, weight, and power constraints. The accredited 70-billion-parameter model takes 140 GB of VRAM at bf16 and the edge GPU has 32 GB. Something has to give.

What gives, today, is the assumption that the deployed model is the accredited model. Standard 4-bit quantization — AWQ, GPTQ, EXL3 — produces an artifact that's numerically close to the accredited model but not byte-identical to it. Different CUDA versions running the same quantization recipe produce slightly different reconstructed weights. The fingerprint in the ATO evidence package no longer matches the bytes serving on the platform.

For most software in the DoD, this would be a routine reaccreditation event. For AI weights, it's worse than that — because the accreditation board doesn't have a workflow for "is this new artifact, which is numerically different from the accredited one, behaviorally equivalent enough to inherit the accreditation?" That's a research question, not an accreditation control.

The operational consequence is that program managers either deploy at the accredited precision (eating the SWaP cost — and often that's not even possible on the target hardware), or they deploy quantized and the ATO becomes a documentary fiction that everyone hopes the inspector general doesn't notice.

A third option exists.

What ATO actually requires

The relevant text in DoDI 8510.01 (Risk Management Framework for DoD Systems) and the NIST SP 800-53 controls inherited by RMF require:

Configuration management (CM-2): baseline configurations of information systems including software components
System and information integrity (SI-7): software, firmware, and information integrity verification using cryptographic mechanisms
Information system component inventory (CM-8): documented inventory of system components

For traditional software, these controls are satisfied by package signing, hash-checked deployments, and configuration databases. The deployed binary matches the accredited binary, byte for byte, verifiable by SHA-256 or stronger.

For AI weights, the same controls technically apply — the AI model is a "software component" in the framework's text. But in practice, compression breaks the SI-7 link. The deployed weights are not the accredited weights at the byte level.

This isn't a hypothetical compliance gap. It's an actual gap in every quantized-AI-on-edge deployment in DoD today.

The accredited-equals-deployed primitive

Near-lossless 5-bit compression with SHA-256 reproducible reconstruction reframes the gap into a control.

The compression pipeline produces a .uc pack — a binary artifact that, when decoded by the public decoder, reconstructs the compressed weights reproducibly — byte-for-byte, every load — on any CUDA / GPU / kernel combination. The pack ships with a per-tensor SHA-256 manifest. The accreditation board's evidence package includes the manifest. The deployment artifact on the edge platform IS the .uc pack. Every reconstruction matches the manifest, every load.

The SI-7 control becomes:

uc verify <deployed_pack>

The output is a SHA-256 fingerprint. If it matches the manifest entry in the ATO evidence package, the deployed weights are byte-identical to the accredited weights. The control is satisfied cryptographically, not through process attestation.

For the program manager, this changes three things:

1. The compression decision moves out of the accreditation conversation. Currently, "we want to deploy this in 4-bit instead of 16-bit" is a re-accreditation trigger because the deployed artifact differs from the accredited artifact. With cryptographic reconstruction, the 5-bit pack IS the accredited artifact. The accreditation board reviews the pack (and the 5-bit reference it reconstructs to), and the program manager deploys exactly that pack everywhere.

2. The continuous monitoring story becomes a one-liner. RMF requires ongoing assurance that the system remains in its authorized state. For AI weights, "the system" = "the deployed model." A daily uc verify against the field artifact, with the result logged to the monitoring stack, is a fully automated SI-7 control.

3. The supply chain integrity story extends to subcontractor handoffs. If a defense prime contracts a startup for AI weights and the startup ships a near-lossless 5-bit pack, the prime can verify the pack's manifest against what the startup committed to. The chain of custody from training to deployment is cryptographic, not process-based.

The hardware caveat that matters

There's one operationally honest detail.

The reproducible reconstruction guarantee is about the weights — what gets loaded into VRAM. It is NOT a guarantee that two different inference kernels running the same weights will produce identical token outputs. Stochastic decoding, parallel-attention numerics, and CUDA driver versions can all introduce tiny non-determinism downstream of the weight load.

For a deployed AI system, that means:

Reproducible weights = the SI-7 control we just described. Always available.
Bit-identical inference behavior = a stricter property that requires also pinning the inference kernel, the CUDA driver version, the GPU SKU, the sampling parameters, and the input pipeline. Achievable but requires more discipline.

The first is necessary; the second is sufficient. Most ATO processes require the first and treat the second as a kernel-side configuration management problem. That's the right shape: weight identity is a primitive, behavior identity is built on top.

Where this fits for defense deployments

The primitive that matters for defense AI is the same one across the board: cryptographic reproducible reconstruction as a runtime assurance control. Confirm that the model running at the edge is bit-for-bit the model that was accredited — a one-line check rather than a re-accreditation cycle.

If you're a program manager evaluating compression for an edge deployment, or a Chief Scientist at a defense AI integrator, the audit-grade reconstruction primitive is something you can pilot today. The ATO writes itself when the deployed-equals-accredited check is a one-line cryptographic primitive. The goal is to make sure that primitive exists by the time the framework is updated to require it.

Sipsa Labs builds near-lossless 5-bit transformer compression with SHA-256 reproducible reconstruction. Patents pending. Engineering due-diligence under NDA: founder@sipsalabs.com.