Saad Ullah Bilal — AI Systems Architect

If you're a CTO watching your AI spend climb steadily while the demonstrated value stays frustratingly murky, this one is written for you.

The instinct to standardize the whole organization on a single frontier model is completely understandable. One vendor relationship, one integration to maintain, one mental model for the team, one number to negotiate. Simplicity has real value. But it's also exactly how you end up paying premium, top-of-menu rates to do enormous volumes of work that a small fraction of that model could handle flawlessly.

Standardization on the most expensive option isn't simplicity — it's a recurring overpayment you've decided not to look at. The more sophisticated framing is a portfolio, and knowing precisely where the boundary falls is now a core engineering competency.

Where Small Language Models Excel

Some work genuinely belongs to small language models. Let's be concrete about what that looks like in practice.

Internal Copilots

The overwhelming majority of internal questions are about your world — your processes, your data, your acronyms. A smaller model tuned on internal context routinely outperforms a generic frontier giant on these questions, at a fraction of the per-query cost. Your employees don't need a model that knows world history; they need one that knows how you file an expense report.

Classification

Sorting support tickets, tagging documents, categorizing incoming requests, labeling sentiment. These tasks are narrow, extremely high-volume, and well-defined — the textbook sweet spot for a small model. There is no reasoning premium to pay here; there's just a category to assign, millions of times.

Extraction

Pulling structured fields out of invoices, forms, contracts, and applications. The task has clear boundaries and clear right answers, which means it's eminently testable. You don't need an open-ended reasoning engine; you need a fast, reliable extractor that gets the total and the date right every time.

Ticket Routing

Deciding where each incoming request should go is a fast, repetitive decision made millions of times a day. Every millisecond of latency and every cent of cost compounds across that volume. Small models win decisively on both axes, and the routing decision rarely needs frontier-level reasoning to be correct.

Understanding the intent behind a query and ranking results doesn't require a frontier model. It requires a fast, accurate one that returns good results before the user's patience runs out.

Where Frontier Models Remain Necessary

Complex Reasoning

Multi-constraint problems riddled with real ambiguity, where the answer isn't pattern-matchable from prior examples and the model has to actually think through the tradeoffs — this is what frontier models are built for, and it's where their premium is fully justified.

Strategy

Synthesizing across many domains at once, weighing competing considerations, and generating genuinely novel framing or insight. You would not hand this to a model optimized for ticket routing, any more than you'd ask your data-entry team to set corporate direction.

Research

Open-ended exploration where the path to the answer isn't known in advance, and the value comes from breadth, depth, and the ability to connect distant ideas. Frontier models' wide capability is precisely the asset here.

Multi-Step Planning

Decomposing a fuzzy, high-level goal into a coherent, ordered sequence of actions — and adapting that plan as reality pushes back — benefits directly from the strongest reasoning available.

The Portfolio Decision

Small Language Models

High-volume, well-defined tasks

Tuned on your internal domain

Millisecond latency at low cost

Classification, extraction, routing, search

Where most of your transaction count lives

Frontier Models

Low-volume, high-complexity problems

Genuine ambiguity and open-ended reasoning

Strategy, synthesis, novel insight

Multi-step planning with adaption

Where your genuinely hard problems live

The Maturity Move

The underlying pattern, once you see it, is hard to unsee: small language models handle the high-volume, well-defined work that makes up the vast majority of your transaction count. Frontier models handle the low-volume, high-complexity work that makes up the vast majority of your genuinely hard problems. The volume and the difficulty live in different places.

Spend your frontier-model budget exactly where it changes outcomes — the hard, ambiguous, high-stakes thinking. And run absolutely everything else on the smallest model that does the job well. That's not a compromise or a cost-cutting concession. It's simply good engineering applied to a new and expensive resource.