№ 011ops · cost & observability filed may '26

Cost dashboards that don't lie.

Per-feature, per-user, per-prompt — the three columns nobody shows.

Austin AI Guy·9 min read

This is the setup for tracking LLM spend by feature, user, and prompt instead of trusting the provider's aggregate bill. Half-day build, monthly review, anomaly alerts to Slack.

What you'll have when you finish: every LLM call tagged with user_id / feature / prompt_id, a Postgres llm_calls table partitioned by month, three Metabase dashboards (by feature / by user / by prompt), and a daily Slack alert that fires when any feature's week-over-week spend grows more than 30%.

Accounts you'll need: langfuse.com · supabase.com (Postgres) · metabase.com · a Slack incoming webhook. All free or low tier at small-team scale. Real cost is half a day of engineering time.

The stack.

01Langfuse — per-call tracingdaily
02Postgres — your own cost table, not provider'sdaily
03Plausible — feature usage eventsdaily
04Metabase — the three dashboardsweekly
05Slack alerts — anomaly thresholdsdaily

How to apply it.

0130 min

Tag every LLM call with three fields.

user_id, feature, prompt_id. Add them to the metadata of every Anthropic, OpenAI, whatever-API call. Without these, the dashboards lie.

The bug to avoid: "this is internal so we'll add tags later." Three months later you have six months of untagged calls and a useless dataset.
0245 min

Store in your own DB, not the provider's.

Langfuse traces feed a Postgres table you control: timestamp, user_id, feature, prompt_id, model, input_tokens, output_tokens, cost_usd. Anthropic's UI is a fallback, not the source of truth.

Provider dashboards rotate data. Your table doesn't.
0360 min

Build the three views.

One Metabase dashboard, three sections. Time-series for each. 30-day and 7-day windows side by side.

by feature
what is eating spend

Tag every LLM call. Anomaly alerts fire per feature, week-over-week.

by user
who is profitable

Join cost to retention. The expensive users who churn are the throttle list.

by prompt
which prompt is bleeding

Hash the prompt as the version. Cost shifts get attributed exactly.

Sort each table descending by cost. The top 3 rows are 80% of the story.
0430 min

Anomaly alerts to Slack.

One alert per feature: if weekly spend grows more than 30% over the previous week, post to the team channel with the feature name and the delta. Not total spend. Per-feature spend.

Total-spend alerts are useless — by the time the total moves, the per-feature signal was visible for days.
05monthly

Monthly kill review.

Last day of the month: pull the bottom quartile of features by usage. Those are the features eating cost without earning it. Kill, throttle, or merge into a stronger feature.

This is the move every team skips. It's also the one that keeps total spend in check year over year.

What we stopped doing.

×Trusting the provider's billing UI. It aggregates. It rounds. It's late.
×Calling cost "infra cost." It's a feature cost. Tag it accordingly.
×Alerting on total spend. Total alerts only after damage is done.
×Letting prompts go un-versioned. If you can't tell which prompt cost what, you can't optimize.
×Treating cost as engineering's problem. Product owns feature cost.
×Storing only aggregates. Store every call. Storage is cheap. Re-analysis is free.

The take.

If you can't see cost per feature, per user, and per prompt, you can't run the product. Three columns. One Metabase board. The dashboard pays for itself the first month it exists.

Steal one thing: the per-feature anomaly alert. The other two columns are nice. This one is what catches the bill before it lands.

Decision rules three signals worth acting on

If you see… Do this Don't do this

Dashboard total drifts > 5% from Anthropic billing A model or prompt isn't being tagged. Find it before next month closes. Trust the dashboard anyway.

One feature is 30%+ of total cost Audit that feature's prompt for cache-hit opportunities. The fix is usually structural. Set a budget alert and move on.

Per-user cost concentrated in < 5% of users Look at what they're using. They're either power users (good) or stuck in a loop (broken). Throttle them.

Three drop-ins. The tagging schema, the three Metabase queries, the alert rule.

The tagging schema.

Every LLM call goes through this wrapper. Without it, the dashboards are fiction.

// llm-call.ts — wrap every provider call

type LLMCall = {
  user_id:     string;
  feature:     string;   // e.g. "summary.generate", "support.reply"
  prompt_id:   string;   // hash of the prompt text — auto-versioned
  model:       string;
  input_tokens:  number;
  output_tokens: number;
  cost_usd:    number;
  cache_hit:   boolean;
  latency_ms:  number;
  ts:          Date;
};

export async function tracedCall(args, params: LLMCall) {
  const res = await anthropic.messages.create(args);

  await db.llm_calls.insert({
    ...params,
    input_tokens:  res.usage.input_tokens,
    output_tokens: res.usage.output_tokens,
    cost_usd:      computeCost(res.usage, args.model),
    cache_hit:     res.usage.cache_read_input_tokens > 0,
    latency_ms:    Date.now() - args._startedAt,
  });

  return res;
}

// PARTITION the table monthly. INDEX on (feature, ts).

The three Metabase queries.

One dashboard. Three rows. Sorted desc on cost.

-- BY FEATURE — 30d
SELECT
  feature,
  SUM(cost_usd)                AS total_cost,
  COUNT(*)                     AS calls,
  ROUND(SUM(cost_usd)::numeric / NULLIF(COUNT(*),0), 4) AS cost_per_call
FROM llm_calls
WHERE ts > now() - interval '30 days'
GROUP BY feature
ORDER BY total_cost DESC;

-- BY USER — 30d (top 50)
SELECT
  user_id,
  SUM(cost_usd)                AS total_cost,
  COUNT(DISTINCT feature)      AS features_touched,
  COUNT(*)                     AS calls
FROM llm_calls
WHERE ts > now() - interval '30 days'
GROUP BY user_id
ORDER BY total_cost DESC
LIMIT 50;

-- BY PROMPT — 30d
SELECT
  feature,
  prompt_id,
  SUM(cost_usd)                AS total_cost,
  COUNT(*)                     AS calls,
  AVG(input_tokens + output_tokens) AS avg_tokens
FROM llm_calls
WHERE ts > now() - interval '30 days'
GROUP BY feature, prompt_id
ORDER BY total_cost DESC;

The anomaly alert rule.

Per-feature, week-over-week, 30% threshold. Fires once daily, max.

-- alert.sql — run daily at 8am

WITH this_week AS (
  SELECT feature, SUM(cost_usd) AS cost
  FROM llm_calls
  WHERE ts > now() - interval '7 days'
  GROUP BY feature
),
last_week AS (
  SELECT feature, SUM(cost_usd) AS cost
  FROM llm_calls
  WHERE ts BETWEEN now() - interval '14 days'
              AND now() - interval '7 days'
  GROUP BY feature
)
SELECT
  this_week.feature,
  last_week.cost            AS prev,
  this_week.cost            AS current,
  ROUND(((this_week.cost - last_week.cost) / NULLIF(last_week.cost,0))::numeric, 2) AS pct_change
FROM this_week
JOIN last_week USING (feature)
WHERE this_week.cost > 5     -- ignore noise
  AND (this_week.cost - last_week.cost) / NULLIF(last_week.cost,0) > 0.30
ORDER BY pct_change DESC;

-- Wire each row to a Slack message:
--   ":fire: feature `{feature}` spend up {pct_change}% WoW
--    (${prev} -> ${current})"

Related stackThe agent stack →

Next in the libraryBrowse all 12 guides →

Need this done for you? The author works on this exact thing with audit clients at austinaiguy.com.

Cost dashboards that don't lie.

The stack.

How to apply it.

Tag every LLM call with three fields.

Store in your own DB, not the provider's.

Build the three views.

what is eating spend

who is profitable

which prompt is bleeding

Anomaly alerts to Slack.

Monthly kill review.

What we stopped doing.

The take.

Per-user ceilings with auto-throttle.

Prompt versioning in the table.

Cohort retention vs. cost.

Auto-PR on feature share.

The tagging schema.

The three Metabase queries.

The anomaly alert rule.