Engineering

MCP server observability

Three pillars: logs, traces, metrics. MCP servers need them like any production service.

Yash ShahApril 3, 20262 min read

A team's MCP server occasionally returned wrong results. Without observability, debugging meant trying to reproduce locally. Reproducing was hard; debugging was slow.

MCP servers are production services. Observability isn't optional.

The three pillars

Logs. Each tool call logged with input, output, duration, error if any.

Traces. Distributed tracing across the server's downstream calls.

Metrics. Latency, throughput, error rate per tool.

These three combine to debug production issues.

Reviewer ritual

PR review:

Observability for new tools.
PII redaction in logs.
Metrics that flow to the team's monitoring stack.

A real implementation

A team's MCP server:

Structured logs (JSON) per tool call.
OpenTelemetry traces for downstream calls.
Prometheus metrics: per-tool count, p50/p95/p99 latency, error rate.
Grafana dashboards.

When an issue surfaces, the team has the evidence to debug.

Trade-offs

Observability adds:

Slight per-call latency.
Storage cost.
Engineering setup.

The trade-off is worth it for any production server.

Limits

Observability captures what happened. It doesn't:

Predict failures.
Fix bugs automatically.

It's input to the team's debugging, not a substitute for it.

What we won't ship

MCP servers in production without the three pillars.

Logs with unredacted PII.

Metrics that don't flow to dashboards.

Observability that nobody reads.

Close

MCP server observability is the production discipline. Logs, traces, metrics. Each captures different signal. Together, they make debugging tractable.

MCP server observability

The three pillars

Reviewer ritual

A real implementation

Trade-offs

Limits

What we won't ship

Close

Related reading

Determinism harnesses for non-deterministic systems

Multi-agent orchestration: from kitchen brigade to opera

Retry strategies that don't compound errors