Engineering

MCP server rate limits: the polite-rejection pattern

Rate limits protect the server. The 429 response tells the AI to back off.

Yash ShahApril 1, 20262 min read

A team's MCP server got hammered by an AI agent in a tight loop. Server resources exhausted. All users impacted. The fix would have been rate limits.

Rate limits protect the server. The polite-rejection pattern (HTTP 429 with retry guidance) tells the AI to back off.

The 429 contract

When rate-limited:

Return 429 status.
Include Retry-After header.
Include error message explaining the limit.

The AI assistant respects the limit. The server stays healthy.

Reviewer ritual

Rate-limit configuration:

Per-user limits.
Per-tool limits.
Burst allowance.
Long-window limits.

A real implementation

A team's MCP server:

60 requests per minute per user.
Burst of 10 in 5 seconds.
1000 requests per hour per organisation.
429 response with retry-after on breach.

The AI assistant respects limits. Tight-loop bugs in agents stop being server-killing.

Trade-offs

Strict limits: protect the server; some legitimate work is delayed.
Loose limits: more user-friendly; server can be overwhelmed.

The right level depends on the server's capacity.

Edge cases

Some operations are inherently bursty (initial load).
Some users are heavy by role (admins).
Some tools are inherently slow (allow fewer concurrent).

Configure per-tool, per-user where it matters.

What we won't ship

MCP servers without rate limits.

Limits without 429 responses.

Limits without retry-after guidance.

Limits that aren't tested.

Close

MCP server rate limits are the politeness pattern. 429 responses with retry-after. The AI backs off. The server stays healthy. Skip the limits and the next runaway agent takes the server down.

MCP server rate limits: the polite-rejection pattern

The 429 contract

Reviewer ritual

A real implementation

Trade-offs

Edge cases

What we won't ship

Close

Related reading

Determinism harnesses for non-deterministic systems

Multi-agent orchestration: from kitchen brigade to opera

Retry strategies that don't compound errors