BIPI

Agentic AI Isn't a Chatbot: What We Built When a Client Asked for 'AI Automation'

Agentic AI

The client came in asking for a chatbot. They meant well. What they actually needed was a loop that could execute six actions across four systems. Here's what the gap looked like, and what we shipped.

By Arjun Raghavan, Security & Systems Lead, BIPI · April 10, 2026 · 8 min read

#ai-agents#llm#automation#engineering

The client came in with a clear ask. 'We want an AI chatbot that can do things.' They meant well. They had seen GPT-powered support demos, internal Slack copilots, and the usual conference theatre, and wanted the same inside their operations team. What they actually needed was not a chatbot.

This post is about that gap, and what we ended up shipping.

The gap between chat and agency

A chatbot answers questions. An agent takes action. That distinction sounds semantic until you architect the two systems side by side.

A chatbot is a conversational frontend over a model. It takes input, generates a response, maybe does one retrieval hop, and outputs text. It is stateless in the way that matters. The user decides what to do next.

An agent is a loop. It receives a goal, inspects the state of the world (often through tools), decides what action will move it closer to the goal, executes that action, and repeats until it hits a terminal condition or a guardrail. The model is the decider. The tools are the hands. The loop is what makes the whole thing feel alive.

The model is the decider. The tools are the hands. The loop is what makes the whole thing feel alive.

The client's operations problem was a loop problem. 'When a new supplier onboarding request comes in, pull their GSTIN, check MCA filings for red flags, verify bank details, send a Slack message to the finance reviewer, and open a ticket in our ERP with the pre-filled fields.' That is five actions across four systems. A chatbot could talk about doing that. An agent could actually do it.

What we built

The shipping system had three components.

The decider. A Claude Sonnet-backed loop with a tool registry. We scoped it to a specific workflow (supplier onboarding) instead of letting it be a generalist. Scope is how you keep hallucinations from turning into postings.

The tool layer. Six tools, each a tight wrapper around a system API. lookup_gstin, check_mca_filings, verify_ifsc, post_slack, open_erp_ticket, request_human_review. Each tool has strict schemas, retries, and idempotency keys. If the agent decides to do something that requires human sign-off, request_human_review pauses the loop and posts to a channel.

The observability layer. Every tool call logs to our audit store with the model's reasoning, the tool arguments, and the result. That is non-negotiable. An agent without a trace is a liability, not an asset.

Tools the agent can call

78%

Agent outputs approved unchanged

100%

Tool calls logged and traced

What surprised us

Two things.

First, the model's error modes were not the ones the team feared. Nobody used SQL injection to hijack it. The actual failure was polite over-confidence. The agent would proceed when the data was ambiguous rather than ask for clarification. The fix was a cheap one. We added a confidence field to the system prompt and a rule that below a threshold the loop must call request_human_review.

Second, the adoption curve. We expected the finance reviewer to use the agent as a starting point and hand-edit the drafts. In practice, within six weeks, the reviewer was approving roughly 78 percent of agent outputs unchanged. The agent was more consistent than the humans, not because it was smarter, but because it never got tired.

When you should not use an agent

An agent is the wrong tool if any of these are true.

The workflow is simple enough to be a three-step automation. Use a script.
The cost of a wrong action is high and cannot be rolled back (wire transfers, production database writes).
You cannot explain, in one paragraph, what the agent is allowed to do. If you cannot scope it, you cannot ship it.
You cannot log everything it does.

The first item is the most common mis-fit. Companies reach for agents when a Zapier flow would suffice. Agents earn their weight in the fifth or sixth action of a workflow that requires judgment.

Closing

The word 'agent' is doing a lot of work right now. Most of what is sold under that name is a thin LLM wrapper plus a couple of tools. What makes a system actually agentic is the loop, the tool discipline, and the observability. Get those three right and you have infrastructure. Get them wrong and you have a very expensive chatbot.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.