RAG vs. fine-tuning: a 90% decision tree

The RAG-or-fine-tune debate is one of the most-wasted conversations in AI engineering. Most cases have an obvious answer. The remaining few benefit from doing both.

Here's the decision tree we use with teams. It saves weeks of analysis paralysis.

Start: what are you trying to achieve?

"I want the model to know our internal docs." → RAG. Always. Don't fine-tune documents.

"I want the model to write in our company voice." → Few-shot prompting first. Fine-tune if few-shot isn't enough after real testing.

"I want the model to output a specific JSON shape." → Structured-output features in the model + prompt engineering. Fine-tune only if accuracy on schema is consistently below your floor.

"I want the model to handle a domain-specific language (legal, medical, financial)." → RAG first with domain context. Fine-tune if your evals show consistent gaps after RAG is well-tuned.

"I want the model to be cheaper or faster." → Fine-tune a small model. This is the most legitimate case for fine-tuning today.

"I want the model to be more accurate on my niche task." → Try RAG + better prompting + better retrieval first. If you've genuinely exhausted those, then fine-tune.

The hierarchy

For any problem, try in order:

Prompt engineering. A better prompt frequently solves the problem. Free, fast, debuggable.
Few-shot examples. Add 3-7 examples to the prompt. Often solves voice/format problems.
RAG. Retrieves the relevant facts. Solves knowledge problems.
Fine-tuning. Reshapes the model's behavior. Solves consistency-of-output problems.

Most teams jump to step 4 because it sounds like the most sophisticated answer. Sophistication isn't the goal. Shipping is.

Why RAG beats fine-tuning for knowledge

Documents change. Your help docs get updated weekly. A fine-tune is frozen at training time. Re-training every week is expensive and slow. Retrieval reads the current docs.

Provenance. RAG lets you cite. The user sees which doc the answer came from. Fine-tuning collapses sources into model weights.

Updates are cheap. Adding a new policy doc to RAG is a re-index. Adding it to a fine-tune is a re-train.

Audit. When the answer is wrong, you can trace it. With fine-tuning, the bug is buried in weights.

When fine-tuning genuinely helps

Format consistency. You need every output to match a strict shape. Fine-tuning beats prompting at high volume.
Style/voice. Brand voice that's distinctive and hard to convey in a prompt. Fine-tune on examples of the voice.
Cost. Fine-tuning a 7B to match a frontier-model output on a specific task. Inference is 10-30x cheaper.
Latency. Smaller fine-tuned models run faster.
Tool use. Fine-tuning on tool-call sequences for very high-volume agents.

The "do both" pattern

The teams shipping the most impressive AI features in 2026 typically combine:

RAG for knowledge.
Fine-tuning for format/voice consistency.
Prompt engineering for the system instructions on top.

The combination outperforms either alone for complex use cases. The order of construction is: prompt → RAG → fine-tune. Each layer addresses a different problem.

The cost reality

Approach	Engineering days	Recurring cost	Update cost
Prompt engineering	1-2	$	minutes
Few-shot	1-3	$$	minutes
RAG	5-15	$$	hours
Fine-tuning	15-60	$$$	days to weeks

The further down the list, the more expensive to build and to maintain. Justify the trip with measured eval improvements you couldn't get higher up.

The trap

The trap that consumes months of teams' time: fine-tuning before exhausting RAG. We've seen teams spend a quarter fine-tuning a model to do retrieval, when a 3-day RAG implementation would have outperformed it.

Fine-tuning is a hammer. Not every problem is a nail.

Close

Most cases have an obvious answer once you ask the right question. RAG for knowledge, prompt-engineering for behavior, fine-tuning when measurement says the others failed. Walk down the hierarchy in order; don't skip rungs.

RAG vs. fine-tuning: a 90% decision tree

Start: what are you trying to achieve?

The hierarchy

Why RAG beats fine-tuning for knowledge

When fine-tuning genuinely helps

The "do both" pattern

The cost reality

The trap

Close

Related reading

The AI productivity playbook: a real engineer's day

Claude Code + PostHog: analytics-aware development

Claude Code + Sentry: incident debugging as conversation