I've watched teams spend three months and $40,000 fine-tuning a model to do something a well-crafted prompt would have handled in two days. I've also watched teams spend six months trying to coax GPT-4 with prompts to do something that fundamentally required fine-tuning from the start.
The largest frontier model delivers roughly 90% of the value of a well-chosen smaller model, at something like ten times the cost. For one-off strategy work, that premium is irrelevant. For workflows that run fifty thousand times a day, it's the difference between a viable product and a line item your CFO circles in red.
Neither is a failure of intelligence — it's a failure of having a clear decision framework. Here's the one I use.
Start with Prompting. Always.
If you haven't exhausted prompt engineering, you're not ready to fine-tune. Prompt engineering is fast (hours to days), cheap (API costs), and reversible. Fine-tuning is slow (weeks), expensive (GPU compute + data labeling), and locks you into a specific model version.
Before considering fine-tuning, try: few-shot examples, chain-of-thought prompting, structured output constraints, system prompt refinement, and prompt chaining. If these don't get you to 80% of your target quality, then start the fine-tuning conversation.
When Small Models Win
When Fine-Tuning Actually Wins
The Data Problem
The most common reason fine-tuning projects fail isn't the training process — it's the data. You need high-quality input-output pairs that accurately represent what you want the model to do. 500 mediocre examples will produce a mediocre model.
If you can't define the output quality criteria clearly enough to label 500 examples consistently, you're not ready to fine-tune. Get clearer on what 'good' looks like first — then collect data.
The Maturity Move
The frontier model is your senior consultant. Brilliant, expensive, and absolutely not who you call to file routine paperwork. You bring in the consultant for the hard, ambiguous, consequential problems. You don't put them on data entry.
The maturity move in enterprise AI isn't picking the biggest model and feeling reassured. It's building the discipline to ask, task by task, what's the smallest model that does this job well — and reserving your frontier budget for work that genuinely earns it.
