Skip to content

Right-size the model to the task

Optimize AI SpendChapter 2 of 8

The single largest structural saving in AI spend is also the easiest to explain: stop running every job on the most powerful, most expensive model. This chapter shows you how to match each workflow to the cheapest model that still clears its quality bar, and how to spend frontier-model money only on the small minority of cases that actually need it. Get this right and you can cut a model bill by more than half before touching anything else.

Every major provider sells models in roughly three tiers. The names change; the shape does not.

TierGood atTypical fund use
SmallFast, cheap, high-volume tasks with a clear answerTagging, extraction, first-pass deck screening
MidA balance of judgment and costMeeting-note summaries, drafting, routine research
FrontierThe hardest reasoning, where a mistake is expensiveInvestment-committee-grade analysis, deep due diligence

As Chapter 1 showed, the price gap across those tiers is about five times within a single provider, and wider across the market. That gap is the opportunity. Most of what a fund asks AI to do is low-judgment, high-volume work: read this, tag that, pull these fields, sort this inbound. That work belongs on the small or mid tier. The frontier tier is for the few decisions where the extra reasoning changes the answer and a wrong answer is costly.

The common mistake is to pick the best model once, because it is the best, and run everything through it. That is like sending every letter by overnight courier. The courier is excellent. You still do not need it for the electricity bill.

The cascade: cheap first, escalate the hard few

Section titled “The cascade: cheap first, escalate the hard few”

You do not have to guess in advance which items are hard. A cascade decides for you. Run every item through the cheap model first. If the cheap model is confident, you keep its answer. If it is not, you escalate that one item to a bigger model for a second look. Frontier-model money is then spent only on the hard minority.

confidentlow confidenceNew deck to screenSmall modelfirst passVerdict filedFrontier modelsecond look
A model cascade: the cheap model clears the confident majority; only low-confidence items escalate to a costlier model.

The economics work because, in most fund workflows, the easy cases vastly outnumber the hard ones. If nine out of ten decks are clear passes or clear pursues, nine out of ten are handled at the cheap rate, and you only pay the frontier rate on the tenth. Teams that publish their results commonly report cutting cost by half or more this way while keeping the great majority of the quality. Treat those percentages as directional, not as a promise; the exact numbers depend on your own mix of easy and hard cases.

The effort dial: cheaper without switching models

Section titled “The effort dial: cheaper without switching models”

There is a second, lower-friction lever that does not require running two models at all. Most current models expose a reasoning-effort setting: a dial that tells the model how hard to think before it answers. Anthropic’s version, called effort, runs from low to maximum. A lower setting means the model spends fewer tokens thinking, which means it costs less and answers faster.

The point is that thinking effort should match task difficulty. Extracting a founding date from a deck does not need deep deliberation; a low effort setting answers it correctly for a fraction of the tokens. A contested diligence question might warrant a high setting. Because this is a single setting rather than a second system, it is the easiest cost lever to reach for, and often the first one to try.

This is discipline, not a one-time setting

Section titled “This is discipline, not a one-time setting”

Right-sizing is not something you configure once and forget, for two reasons.

First, under-sizing has a real and asymmetric cost. If you route a genuinely hard screening call to a model that is too small, it can miss a good deal or wave through a bad one, and that mistake can dwarf the few cents you saved. So the discipline is not “always use the cheapest model.” It is “use the cheapest model that still clears the bar for this specific job.”

Second, a cascade that is tuned badly saves nothing. If it escalates everything, you get no savings and pay twice. If it escalates too little, quality drops in exactly the cases that mattered. The dial has to be set against evidence.

New models ship constantly, and each release reshuffles the tiers. A model that was frontier-only last year may be a mid-tier commodity this year. Re-check your routing when a provider ships something new; the cheapest model that clears your bar changes over time.

Knowledge check

In a cascade, why does most of the volume cost the cheap rate even though a frontier model is involved?