What I Actually Think About AI Agents

There’s a pattern I keep seeing: someone announces an “AI agent” and it’s a while-loop that calls an LLM, checks if it’s done, and loops again. That’s not wrong, but it’s also not what makes agents interesting.

What actually matters

The interesting part isn’t the loop. It’s the context window management — what you put in, what you leave out, and when you hand off to a different model or tool. An agent that can’t manage what it pays attention to degrades fast.

The second thing is failure modes. A regular function either works or throws. An agent can confidently stride in the wrong direction for twenty steps before anyone notices. Observability and checkpointing aren’t nice-to-haves, they’re the product.

What I’m watching

Multi-agent systems where specialized models hand off to each other rather than one generalist doing everything
Long-context models (Gemini 1.5, Claude 3.x) changing the tradeoffs around chunking and retrieval
Whether tool use ever gets standardized enough that agents become composable across providers

The uncomfortable truth

Most “agent” demos work on demos. The hard part is building one that degrades gracefully when the LLM is wrong, the API is flaky, or the user’s intent was ambiguous from the start. That’s boring infrastructure work, and it’s where most of the value is.

More soon.