The Three Tools

You want an LLM to do something specific. There are three levers you can pull, in increasing order of cost and complexity:

Prompt engineering: change what you write in the prompt. Fastest, cheapest.
RAG (retrieval-augmented generation): add external knowledge to the prompt at query time. Medium effort.
Fine-tuning: retrain the model on your data. Expensive, slow, but powerful for some cases.

Most people reach for fine-tuning when they should be using one of the other two. Let's untangle when each makes sense.

Prompt Engineering

Just write better prompts. Tactics that consistently help:

Instructions before context: tell the model what to do, then provide context.
Few-shot examples: show 2-5 examples of the input/output pattern you want.
Chain-of-thought: ask the model to reason step by step before answering.
Role assignment: "You are an expert software architect..."
Output format: specify exact format (JSON schema, markdown, etc.).
Constraints: "Answer in 2 sentences. Never speculate. If unsure, say so."

Use it when: the model already has the knowledge or capability you need; you just need to elicit it correctly.
Skip it when: the model fundamentally lacks the information (your private data) or the skill (a niche domain you can't describe).

RAG

The model doesn't know your data. RAG fixes that by retrieving relevant pieces at query time and adding them to the prompt. Covered in detail in the RAG article.

Use it when:

Your data changes over time (RAG updates by re-indexing).
You need source attribution (cite which document the answer came from).
Hallucination must be minimized (model is grounded in retrieved text).
Your knowledge base is large (a hundred documents, a million, doesn't matter).

Skip it when:

You don't have specific knowledge to retrieve; you want to change the model's style or behavior.
Latency is critical and even a small RAG pipeline adds 100ms.
The knowledge fits in the prompt directly without retrieval.

Fine-Tuning

Retrain the base model (or a small adapter) on your specific data. The model's weights change. New behaviors get baked in.

Approaches:

Full fine-tuning: update all model weights. Expensive, lots of GPU. Rare outside specialized labs.
LoRA / QLoRA: add small trainable adapters. Cheap, fast, effective. The standard approach today.
Reinforcement learning from human feedback (RLHF): for alignment to preferences. Used by base model providers, not typically by application teams.

Use it when:

You need a specific style or format consistently (e.g., always producing valid JSON in a custom schema).
The model needs to learn domain language (medical jargon, legal patterns, your company's argot).
You have many high-quality examples (1000+).
Latency is critical and you can't afford retrieval overhead.
You want a smaller model to mimic a bigger one (distillation).

Skip it when:

The "knowledge" you want to add changes (fine-tuning is a snapshot; RAG updates).
You have fewer than 100 high-quality training examples.
Prompt engineering or RAG would solve it for less effort.

Side-by-Side

Prompt EngineeringRAGFine-tuning
Cost to Set Up$0Low to mediumMedium to high
Cost per QueryStandardSlightly higher (retrieval)Standard
Add New KnowledgeEdit promptRe-indexRetrain
Citations / SourceNoneYesNone
Behavior CustomizationLimitedLimitedStrong
Hallucination RiskHighLowMedium
Latency ImpactNone+50-300msNone

How to Decide

A practical decision flow:

1. Try a prompt first. Always. Sometimes you don't need anything else.
2. If the issue is "the model doesn't know X," try RAG.
3. If the issue is "the model's outputs are wrong style/format consistently," try better prompts with examples.
4. If RAG and prompts both fall short on style/format, try fine-tuning.
5. Combine multiple approaches. Fine-tuning + RAG is common for high-stakes products.

Common Combinations

RAG + good prompts: the most common production setup.
Fine-tuned model + RAG: the model learns the style/format, RAG provides the facts.
Prompt + structured output: use function calling or JSON mode to enforce format without fine-tuning.

The One Thing to Remember

Fine-tuning is overrated. Most "we need a custom model" requirements are better served by RAG plus good prompts. Fine-tune when style, format, or domain language genuinely needs to be baked in. Use RAG when the question is "how do I add my data?" Use prompt engineering first, always. The right tool depends on whether the gap is knowledge, behavior, or both.