№ 011ops · cost & observability filed may '26

Cost dashboards that don't lie.

Per-feature, per-user, per-prompt — the three columns nobody shows.

This is the setup for tracking LLM spend by feature, user, and prompt instead of trusting the provider's aggregate bill. Half-day build, monthly review, anomaly alerts to Slack.

What you'll have when you finish: every LLM call tagged with user_id / feature / prompt_id, a Postgres llm_calls table partitioned by month, three Metabase dashboards (by feature / by user / by prompt), and a daily Slack alert that fires when any feature's week-over-week spend grows more than 30%.

Accounts you'll need: langfuse.com · supabase.com (Postgres) · metabase.com · a Slack incoming webhook. All free or low tier at small-team scale. Real cost is half a day of engineering time.

01

The stack.

  • 01Langfuse — per-call tracingdaily
  • 02Postgres — your own cost table, not provider'sdaily
  • 03Plausible — feature usage eventsdaily
  • 04Metabase — the three dashboardsweekly
  • 05Slack alerts — anomaly thresholdsdaily
02

How to apply it.

  1. 0130 min

    Tag every LLM call with three fields.

    user_id, feature, prompt_id. Add them to the metadata of every Anthropic, OpenAI, whatever-API call. Without these, the dashboards lie.

    The bug to avoid: "this is internal so we'll add tags later." Three months later you have six months of untagged calls and a useless dataset.

  2. 0245 min

    Store in your own DB, not the provider's.

    Langfuse traces feed a Postgres table you control: timestamp, user_id, feature, prompt_id, model, input_tokens, output_tokens, cost_usd. Anthropic's UI is a fallback, not the source of truth.

    Provider dashboards rotate data. Your table doesn't.

  3. 0360 min

    Build the three views.

    One Metabase dashboard, three sections. Time-series for each. 30-day and 7-day windows side by side.

    by feature

    what is eating spend

    Tag every LLM call. Anomaly alerts fire per feature, week-over-week.

    by user

    who is profitable

    Join cost to retention. The expensive users who churn are the throttle list.

    by prompt

    which prompt is bleeding

    Hash the prompt as the version. Cost shifts get attributed exactly.

    Sort each table descending by cost. The top 3 rows are 80% of the story.

  4. 0430 min

    Anomaly alerts to Slack.

    One alert per feature: if weekly spend grows more than 30% over the previous week, post to the team channel with the feature name and the delta. Not total spend. Per-feature spend.

    Total-spend alerts are useless — by the time the total moves, the per-feature signal was visible for days.

  5. 05monthly

    Monthly kill review.

    Last day of the month: pull the bottom quartile of features by usage. Those are the features eating cost without earning it. Kill, throttle, or merge into a stronger feature.

    This is the move every team skips. It's also the one that keeps total spend in check year over year.

03

What we stopped doing.

  • ×Trusting the provider's billing UI. It aggregates. It rounds. It's late.
  • ×Calling cost "infra cost." It's a feature cost. Tag it accordingly.
  • ×Alerting on total spend. Total alerts only after damage is done.
  • ×Letting prompts go un-versioned. If you can't tell which prompt cost what, you can't optimize.
  • ×Treating cost as engineering's problem. Product owns feature cost.
  • ×Storing only aggregates. Store every call. Storage is cheap. Re-analysis is free.
04

The take.

If you can't see cost per feature, per user, and per prompt, you can't run the product. Three columns. One Metabase board. The dashboard pays for itself the first month it exists.

Steal one thing: the per-feature anomaly alert. The other two columns are nice. This one is what catches the bill before it lands.

Related stackThe agent stack →
Next in the libraryBrowse all 12 guides →

Need this done for you? The author works on this exact thing with audit clients at austinaiguy.com.