In revision.
Crisp5 min readGo deeper →

LangChain and Claude/Gemini APIs

When LangChain helps, when it gets in the way, and how I called Claude and Gemini directly for production agents.

LLM apps boil down to: prompt + context + tool calls + response. LangChain is a Python/JS framework that abstracts these into chains, agents, retrievers, vector stores. It is useful when you genuinely need composition (RAG pipelines, multi-step agents, swapping models). For single-turn calls, the provider SDK is simpler and faster.

What I built

At Binocs and on personal projects, several LLM features:

  • A compliance document analyzer using Claude Sonnet with structured outputs.
  • A meeting-notes summarizer using Gemini Flash for speed.
  • A RAG over our internal docs using LangChain's retriever + Anthropic.
  • A multi-tool agent that could query Postgres, search S3, and post to Slack.

Claude API essentials

  • messages endpoint, role-based. System prompt is a separate top-level field.
  • Tool use: declare tools as JSON schema, model returns tool_use content blocks, you execute, send back tool_result.
  • Streaming via SSE. Use it for any user-facing latency above 2 seconds.
  • Prompt caching: cache long system prompts and reuse for 5 minutes, cuts cost 90 percent on cached tokens.

Gemini API essentials

  • Multimodal first-class: images, PDFs, video as part of the prompt.
  • Long context (up to 2M tokens on 1.5 Pro). You can stuff a whole codebase.
  • Function calling with similar schema as OpenAI.
  • Free tier is generous, great for prototyping.

When LangChain pulls its weight

  • RAG with multiple retrieval strategies (BM25 + vector + reranker).
  • Agents with many tools where you want the framework to handle loops, retries, and observability.
  • Switching models for A/B tests (write once, swap providers).
  • LangSmith tracing is genuinely good for debugging multi-step flows.

When it does not

  • One-shot prompt + response: just call the API.
  • Custom flows where you need fine control over context window management.
  • Performance-critical paths where every ms of abstraction matters.

Learn more