Cost dashboards that don't lie.
Per-feature, per-user, per-prompt — the three columns nobody shows.
This is the setup for tracking LLM spend by feature, user, and prompt instead of trusting the provider's aggregate bill. Half-day build, monthly review, anomaly alerts to Slack.
What you'll have when you finish: every LLM call tagged with user_id / feature / prompt_id, a Postgres llm_calls table partitioned by month, three Metabase dashboards (by feature / by user / by prompt), and a daily Slack alert that fires when any feature's week-over-week spend grows more than 30%.
Accounts you'll need: langfuse.com · supabase.com (Postgres) · metabase.com · a Slack incoming webhook. All free or low tier at small-team scale. Real cost is half a day of engineering time.
The stack.
- 01Langfuse — per-call tracingdaily
- 02Postgres — your own cost table, not provider'sdaily
- 03Plausible — feature usage eventsdaily
- 04Metabase — the three dashboardsweekly
- 05Slack alerts — anomaly thresholdsdaily
How to apply it.
- 0130 min
Tag every LLM call with three fields.
user_id,feature,prompt_id. Add them to the metadata of every Anthropic, OpenAI, whatever-API call. Without these, the dashboards lie.The bug to avoid: "this is internal so we'll add tags later." Three months later you have six months of untagged calls and a useless dataset.
- 0245 min
Store in your own DB, not the provider's.
Langfuse traces feed a Postgres table you control: timestamp, user_id, feature, prompt_id, model, input_tokens, output_tokens, cost_usd. Anthropic's UI is a fallback, not the source of truth.
Provider dashboards rotate data. Your table doesn't.
- 0360 min
Build the three views.
One Metabase dashboard, three sections. Time-series for each. 30-day and 7-day windows side by side.
by featurewhat is eating spend
Tag every LLM call. Anomaly alerts fire per feature, week-over-week.
by userwho is profitable
Join cost to retention. The expensive users who churn are the throttle list.
by promptwhich prompt is bleeding
Hash the prompt as the version. Cost shifts get attributed exactly.
Sort each table descending by cost. The top 3 rows are 80% of the story.
- 0430 min
Anomaly alerts to Slack.
One alert per feature: if weekly spend grows more than 30% over the previous week, post to the team channel with the feature name and the delta. Not total spend. Per-feature spend.
Total-spend alerts are useless — by the time the total moves, the per-feature signal was visible for days.
- 05monthly
Monthly kill review.
Last day of the month: pull the bottom quartile of features by usage. Those are the features eating cost without earning it. Kill, throttle, or merge into a stronger feature.
This is the move every team skips. It's also the one that keeps total spend in check year over year.
What we stopped doing.
- ×Trusting the provider's billing UI. It aggregates. It rounds. It's late.
- ×Calling cost "infra cost." It's a feature cost. Tag it accordingly.
- ×Alerting on total spend. Total alerts only after damage is done.
- ×Letting prompts go un-versioned. If you can't tell which prompt cost what, you can't optimize.
- ×Treating cost as engineering's problem. Product owns feature cost.
- ×Storing only aggregates. Store every call. Storage is cheap. Re-analysis is free.
The take.
If you can't see cost per feature, per user, and per prompt, you can't run the product. Three columns. One Metabase board. The dashboard pays for itself the first month it exists.
Steal one thing: the per-feature anomaly alert. The other two columns are nice. This one is what catches the bill before it lands.
After three months of clean data, these compound.
Per-user ceilings with auto-throttle.
Set a hard $/user/month cap by tier. Free users at $0.50, pro at $5, enterprise unlimited. When a user crosses, the API returns a graceful "approaching limit" instead of silent failure. Free-tier abuse handled at the platform level, not the support queue.
Prompt versioning in the table.
Every prompt change increments a version stored on the call. When cost shifts on a feature, the dashboard shows the version transition. Attribution gets exact instead of forensic.
Cohort retention vs. cost.
Join your cost table to your activation table by user_id. Find the user cohorts that cost the most AND retain the least. They're the ones to throttle or move to a cheaper model. Cohorts that cost a lot AND retain are your VIPs — invest in them.
Auto-PR on feature share.
When a single feature crosses 20% of total LLM spend, open a PR against the prompt and model config for review. Forces a human conversation about whether the spend is earning its place.
Five symptoms with the fix.
№ 01Spend doubled overnight.+
№ 02Wrong attribution.+
prompt_id changed mid-month without versioning.prompt_id to a hash of the prompt text. Changes get a new version automatically.№ 03Dashboard slow.+
№ 04Alerts spamming Slack.+
№ 05Numbers don't match billing.+
Three drop-ins. The tagging schema, the three Metabase queries, the alert rule.
The tagging schema.
Every LLM call goes through this wrapper. Without it, the dashboards are fiction.
// llm-call.ts — wrap every provider call
type LLMCall = {
user_id: string;
feature: string; // e.g. "summary.generate", "support.reply"
prompt_id: string; // hash of the prompt text — auto-versioned
model: string;
input_tokens: number;
output_tokens: number;
cost_usd: number;
cache_hit: boolean;
latency_ms: number;
ts: Date;
};
export async function tracedCall(args, params: LLMCall) {
const res = await anthropic.messages.create(args);
await db.llm_calls.insert({
...params,
input_tokens: res.usage.input_tokens,
output_tokens: res.usage.output_tokens,
cost_usd: computeCost(res.usage, args.model),
cache_hit: res.usage.cache_read_input_tokens > 0,
latency_ms: Date.now() - args._startedAt,
});
return res;
}
// PARTITION the table monthly. INDEX on (feature, ts).The three Metabase queries.
One dashboard. Three rows. Sorted desc on cost.
-- BY FEATURE — 30d SELECT feature, SUM(cost_usd) AS total_cost, COUNT(*) AS calls, ROUND(SUM(cost_usd)::numeric / NULLIF(COUNT(*),0), 4) AS cost_per_call FROM llm_calls WHERE ts > now() - interval '30 days' GROUP BY feature ORDER BY total_cost DESC; -- BY USER — 30d (top 50) SELECT user_id, SUM(cost_usd) AS total_cost, COUNT(DISTINCT feature) AS features_touched, COUNT(*) AS calls FROM llm_calls WHERE ts > now() - interval '30 days' GROUP BY user_id ORDER BY total_cost DESC LIMIT 50; -- BY PROMPT — 30d SELECT feature, prompt_id, SUM(cost_usd) AS total_cost, COUNT(*) AS calls, AVG(input_tokens + output_tokens) AS avg_tokens FROM llm_calls WHERE ts > now() - interval '30 days' GROUP BY feature, prompt_id ORDER BY total_cost DESC;
The anomaly alert rule.
Per-feature, week-over-week, 30% threshold. Fires once daily, max.
-- alert.sql — run daily at 8am
WITH this_week AS (
SELECT feature, SUM(cost_usd) AS cost
FROM llm_calls
WHERE ts > now() - interval '7 days'
GROUP BY feature
),
last_week AS (
SELECT feature, SUM(cost_usd) AS cost
FROM llm_calls
WHERE ts BETWEEN now() - interval '14 days'
AND now() - interval '7 days'
GROUP BY feature
)
SELECT
this_week.feature,
last_week.cost AS prev,
this_week.cost AS current,
ROUND(((this_week.cost - last_week.cost) / NULLIF(last_week.cost,0))::numeric, 2) AS pct_change
FROM this_week
JOIN last_week USING (feature)
WHERE this_week.cost > 5 -- ignore noise
AND (this_week.cost - last_week.cost) / NULLIF(last_week.cost,0) > 0.30
ORDER BY pct_change DESC;
-- Wire each row to a Slack message:
-- ":fire: feature `{feature}` spend up {pct_change}% WoW
-- (${prev} -> ${current})"Need this done for you? The author works on this exact thing with audit clients at austinaiguy.com.