April 4, 2026
What I Actually Think About AI Agents
Everyone is shipping agents. Most of them are just GPT with a for-loop. Here's what I think is actually interesting.
There’s a pattern I keep seeing: someone announces an “AI agent” and it’s a while-loop that calls an LLM, checks if it’s done, and loops again. That’s not wrong, but it’s also not what makes agents interesting.
What actually matters
The interesting part isn’t the loop. It’s the context window management — what you put in, what you leave out, and when you hand off to a different model or tool. An agent that can’t manage what it pays attention to degrades fast.
The second thing is failure modes. A regular function either works or throws. An agent can confidently stride in the wrong direction for twenty steps before anyone notices. Observability and checkpointing aren’t nice-to-haves, they’re the product.
What I’m watching
- Multi-agent systems where specialized models hand off to each other rather than one generalist doing everything
- Long-context models (Gemini 1.5, Claude 3.x) changing the tradeoffs around chunking and retrieval
- Whether tool use ever gets standardized enough that agents become composable across providers
The uncomfortable truth
Most “agent” demos work on demos. The hard part is building one that degrades gracefully when the LLM is wrong, the API is flaky, or the user’s intent was ambiguous from the start. That’s boring infrastructure work, and it’s where most of the value is.
More soon.