Skip to content

Batch the work nobody is waiting for

Optimize AI SpendChapter 4 of 8

A surprising amount of a fund’s AI work does not need an answer this second. Re-scoring the whole pipeline overnight, enriching a fresh sourcing list, summarizing a stack of board decks, drafting the first pass of an LP report: nobody is sitting at a screen waiting for any of these. For that kind of work, providers offer a simple deal. Send it in a queue instead of demanding an instant reply, and pay half price. This chapter is about taking that deal wherever it fits.

Normally, when you call a model, you wait for the answer right then. That is a synchronous call, and it is what you want when a partner is asking a question live. A batch call is the opposite: you hand the provider a pile of work, walk away, and collect the results later. Because the provider can fit that work into spare capacity instead of answering you instantly, it charges less.

The discount is large and consistent. Across Anthropic, OpenAI, and Google, the batch mode runs at 50% off the standard token price. The work is asynchronous, meaning you do not wait on it; most batches finish within an hour, and the providers allow up to 24 hours. You submit, you wait, you get half off. There is no quality difference, because it is the same model doing the same work.

Batching costs you nothing except immediacy. That single trade decides where it belongs.

Good fit (nobody is waiting)Wrong fit (someone is waiting)
Overnight re-scoring of the full pipelineA partner asking a question in the moment
Enriching a new sourcing listLive meeting prep, minutes before the call
Bulk-summarizing board decksAn agent answering a user in real time
Drafting first-pass LP reportsAnything with a person watching the screen
Re-reading a document set to refresh search

The test is one question: is a human waiting for this result right now? If the honest answer is no, the work is a candidate for the batch queue and the 50% discount. Most of a fund’s highest-volume AI work, the scheduled sweeps and the overnight processing, passes that test easily.

Unlike right-sizing, which is a judgment you re-check as models change, batching is close to a one-time configuration. You decide, per pipeline, whether the work is latency-tolerant. If it is, you route it through the batch queue and leave it there. The discount then applies to every run, forever, with no further attention.

And it stacks. As the previous chapter noted, a cached block stays discounted inside a batch job, so the repeated portion of a large overnight run can be cheap on both counts at once: half off for being batched, and roughly a tenth of the price on the cached part. For a nightly job that reads the same rubric across a thousand items, those two discounts compound into a bill that is a small fraction of the naive one.

Knowledge check

Which job is the best candidate for a batch queue?