BIPI
BIPI

Agent Cost Tracking: Per-User, Per-Feature, Per-Tool

Agentic AI

Most cost dashboards stop at total tokens per day. That tells you nothing useful. We share patterns for per-user and per-feature attribution, propagating context through async tool calls, and surfacing spike patterns early.

By Arjun Raghavan, Security & Systems Lead, BIPI · May 17, 2024 · 7 min read

#ai-agents#cost#operations

A B2B platform crossed 600k dollars a month in LLM spend before anyone could answer the simple question of which feature was costing what. The CFO asked. The engineering team had three theories. We ran the numbers and found that 71 percent of cost came from a single feature used by 4 percent of users. None of the three theories had been right.

Cost attribution is one of the easiest wins in agent ops. It does not need a new vendor. It needs discipline about tagging at the call site, propagating context through async work, and a few well-chosen dashboards. Teams that put this in early avoid the awkward CFO conversation by 12 to 18 months.

Tag at the call site, always

Every model invocation in the codebase should carry: user_id, feature, tool (if applicable), prompt_version, and trace_id. Tagging at the call site means the metadata enters the telemetry pipeline already attached, not reconstructed later. Reconstruction is where data quality dies.

We typically wrap the SDK client at one place in the codebase. The wrapper requires the metadata to be present, fails closed if it is not, and propagates the tags through to the provider's metadata field where supported. From there, your telemetry can join cost back to product analytics on user_id and feature.

The async propagation problem

Async tool calls are where cost attribution falls apart in real systems. The agent triggers a background job. The job calls the model. By the time it returns, the original request context is gone. Without explicit propagation, that spend lands in the bucket of the unknown.

  • Pass the metadata through the job payload, not via thread-local context.
  • On retries, preserve the original user_id and trace_id rather than overwriting with the retry job's identity.
  • For fan-out tool calls, tag each child invocation with both the parent and the leaf identifiers.

On the B2B platform, fixing async propagation alone shifted 18 percent of cost out of the unknown bucket and into a feature attribution. That feature turned out to be the expensive one nobody had been looking at.

Dashboards that catch spikes early

A daily total cost chart will tell you about a spike two days late. By then your bill is already real. The dashboards that work in practice catch shifts within hours.

  1. Cost per active user, hourly, with a 7-day baseline. Alert on 2x deviation.
  2. Cost per feature, hourly, normalised by feature usage. Alert on rate change, not absolute.
  3. Top 10 users by spend in the last 24 hours, surfaced as a daily digest, not an alert.
  4. Tool cost as a percentage of total cost, weekly. Sudden shifts often indicate a tool description change or model regression.
  5. Cost per task type, with a moving average. Catches when a feature gets quietly more expensive due to longer prompts or more steps.

Showback before chargeback

Once attribution is reliable, the next question is internal accounting. We push every team we work with toward showback, where each product team sees their cost without it hitting their budget, before chargeback, where it does. Showback for one quarter exposes the cost-quality conversations the team needs to have. Chargeback before that creates fights, not behavior change.

Within a quarter of showback dashboards going live, every team we have worked with has cut their cost per task by at least 20 percent. None of those cuts came from a bigger optimisation project. They came from product owners seeing the numbers and asking which prompts were doing the heavy work.

What good looks like

When we audit an agent stack, we ask one question. Pick a feature, any feature. Can you tell me, within five minutes, the cost per task for that feature, the change over the last 30 days, and the top three drivers of that cost. If the answer is yes, the team is running a real operation. If no, the spend is going to grow faster than the value, and the CFO conversation is coming.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.