Crisp5 min readGo deeper →

Binocs - 99% LLM cost cut on deck pipeline

Cut LLM cost by 99 percent on the Binocs slide deck pipeline by moving structural layout work out of the model and into deterministic code.

I cut LLM cost 99 percent on the Binocs slide deck pipeline. The trick is one sentence - the LLM does writing, deterministic code does layout. Most teams have it backwards.

The pipeline turns a PDF CIM (Confidential Information Memorandum, the deck a banker sends to PE buyers) into an editable structured deck for the associate. The naive version, which I inherited, sent each page to GPT-4 with "extract the layout, the text, the tables, the charts, and write the slide". One CIM is 80 to 200 pages, GPT-4 was reading every page in full, output was inconsistent, and the cost per deck was around 12 USD. At our volume that was burning thousands a month.

I broke the pipeline into two passes. First pass - deterministic. Use pdfminer.six and a custom table extractor to pull text blocks with coordinates, classify them (heading, body, table, caption) using layout rules (font size, position, spacing), extract images and re-render charts via the text underneath. Output a clean structured JSON of "this is the page, this is what is on it, these are the relationships". Zero LLM calls.

Second pass - the LLM writes. The model gets the cleaned JSON and a tight prompt - "given this slide content, write a 2-bullet summary in the Binocs voice". No layout, no parsing, no judgment about what is a heading. Just writing. Tokens per call dropped from thousands to hundreds.

The two-pass pipeline. The LLM only sees the second pass.

Result - cost dropped from about 12 USD per deck to about 0.10 USD per deck. 99 percent. Latency dropped too, because deterministic code is faster than waiting on a model and the LLM calls that remained were small.

The secondary win was consistency. Deterministic code returns the same output for the same input. The headings stopped wandering. The tables stopped being mis-extracted into bullets. Quality went up and cost went down because they have the same root cause.

The lesson I keep saying to other engineers - if you can write a regex for it, you should. If you can write a parser for it, you should. The LLM is the most expensive option, save it for the parts that actually need judgment.

Learn more

Docs
Anthropic prompt engineering docsAnthropic
Docs
OpenAI cookbook - structured outputsOpenAI
Docs
pdfminer.six docspdfminer.six