The audit that stopped a rollout
A European insurer spent four months generating 220 internal screens with a general-purpose coding assistant. The rollout paused two weeks before go-live when internal audit asked a single question: which prompt produced the approval-routing logic on screen 147, and can we reproduce it? Nobody could answer. The rebuild took another quarter.
The incident isn’t rare. It’s the default outcome when AI output is treated as source code instead of a derived artifact.
What auditors actually object to
Auditors don’t have a philosophical problem with LLMs. We’ve sat in enough review meetings to know the objections are concrete and repeatable. They want to know what produced a given control, whether the same input produces the same output, and whether a human with the right role approved the change. SOX, HIPAA, GDPR, and the EU AI Act all converge on the same three questions.
Free-form generation struggles with all three. The prompt history is rarely preserved. The model is non-deterministic. The reviewer is usually a developer cleaning up syntax, not a control owner signing off on intent.
Determinism as a control
Regulated industries treat reproducibility as a first-class control. A payroll calculation that gives different answers on different days is a finding, regardless of how close the answers are. The same standard applies to generated code. If the same specification can produce two different implementations, auditors treat both as unverified.
Structured generation against a JSON Schema narrows the output space enough that reproducibility becomes tractable. The descriptor is the specification. Two runs that produce the same descriptor produce the same running application, bit for bit, because the runtime is fixed.
Traceability through the descriptor
The useful artifact in an audit isn’t the React component. It’s the descriptor that generated it. A descriptor is short, human-readable, and reviewable by a control owner who has never written TypeScript. When a SOX auditor asks how approval thresholds are enforced on the vendor-setup screen, the answer is a 40-line JSON block, not a 600-line component.
We’ve seen this collapse evidence-gathering from weeks to hours. The descriptor ties to a commit, the commit ties to an approver, and the approver ties to a role in the RBAC system. The chain closes without a spreadsheet.
Where AI belongs in the workflow
The question isn’t whether AI writes the code. It’s what the AI is allowed to commit. In the pattern that passes audit, the LLM proposes a descriptor change. A human with the right role reviews and approves it. The runtime compiles it into a running screen. The audit log captures every step.
This is the inverse of the “AI autocomplete” pattern that dominates developer tools. Cursor and its peers optimize for velocity inside the editor. That’s a fine pattern for internal tooling. It’s the wrong pattern for a system that has to answer to an external auditor.
The EU AI Act raises the stakes
The EU AI Act classifies many enterprise decision-support systems as high-risk, which brings logging, human oversight, and technical documentation obligations. Generated code that can’t explain itself is going to struggle under Article 13. Generated descriptors, reviewed and signed, are already most of the way there.
The takeaway
AI-generated code can pass a compliance audit. It just can’t pass one as raw output. The artifact that survives review is the structured specification, reviewed by the right human, compiled by a fixed runtime, and logged end to end. Every regulator we’ve talked to treats that pattern as reasonable. None of them treat “the model wrote it” as an answer.