Validation libraries — pydantic in Python, zod in TypeScript, similar tools elsewhere — are the discipline that catches LLM-output variance before it reaches the downstream code that doesn't tolerate variance. Most teams know they should use them. The teams that ship reliably actually do.
Schema layering
The validation layer sits between the LLM and the consumer:
- LLM produces output.
- Validation parses against schema.
- Valid: pass to consumer.
- Invalid: log, retry, fallback, or escalate.
The consumer sees only valid outputs. The variance is contained.
Error surface
Validation errors are observability events:
- What field failed?
- What was expected vs. received?
- What was the original LLM output?
These let the team diagnose issues. Common patterns:
- A specific field is consistently wrong → prompt issue.
- Outputs occasionally have extra fields → prompt or model issue.
- Confidence scores out of range → prompt issue.
Each pattern informs a fix.
Reviewer ritual
Validation failures are reviewed:
- Daily summary of validation rates.
- Drilled into for patterns.
- Fixes shipped.
Without review, validation becomes silent fallback that hides real issues. With review, validation becomes a quality signal.
Performance cost
Validation is fast (typically sub-millisecond) but not free. For high-volume features, the cumulative cost matters slightly. The right approach is to validate at the boundary, not at every internal step.
A real stack
A team's setup:
- LLM call with structured output mode.
- Pydantic model defines the shape.
model.model_validate_json(response.text)at the boundary.- ValidationError → log + retry once + fallback to default.
- Fallback rate tracked in observability.
Simple. Reliable. Works.
What we won't ship
LLM outputs crossing system boundaries without validation.
Validation that swallows errors silently.
Schemas that are too permissive (allowing anything; the validation isn't doing work).
Schemas that are too restrictive (rejecting valid outputs).
Close
Validation libraries are the practical layer that turns LLM outputs into trustworthy data. Schema. Validate. Handle failures. Review patterns. Ship. The library is mature; the discipline is what makes it work.
Related reading
- Structured output — preceding layer.
- Probabilistic with deterministic contracts — surrounding pattern.
- Constrained decoding — alternative approach.
We build AI-enabled software and help businesses put AI to work. If you're tightening output validation, we'd love to hear about it. Get in touch.