Deflection without destroying NPS.
The three signals that tell you the bot is making things worse — and what to do before they show up in the next survey.
Most deflection bots lose NPS by deflecting tickets they can't actually solve. The fix is structural: put a classifier in front of the bot that routes the hard tickets to humans before the bot ever sees them.
What you'll have when you finish: a deflection layer that handles ~38% of incoming tickets without dropping NPS, gated by a 30-line classifier prompt, three instant handoff signals, and a weekly 20-ticket audit.
Accounts you'll need: intercom.com/fin · plain.com · console.anthropic.com. Pylon and Stonly are optional and only needed once the basics are running.
The stack — five tools, ranked.
- 01Intercom Fin — deflection layerdaily
- 02Plain — B2B routing & ownershipdaily
- 03Anthropic API — escalation classifierdaily
- 04Pylon — Slack-connected accountsweekly
- 05Stonly — guided flows for repeat issuesmonthly
Fin does the deflection. Plain holds the ticket and routes the rest. The classifier decides which is which. Pylon and Stonly are how you handle the long tail.
How to apply it.
-
01bucket
Define the deflection bucket.
Export the last 60 days of tickets to CSV. Add a column called
resolution_type. Tag every row with one of four values:— docs — answered by linking to existing documentation
— faq — answered with a canned/templated response
— human — needed an agent to diagnose
— escalated — needed engineering or leadershipThe deflection bucket is
docs + faq. That's the bot's territory. Everything else routes to a person.For most B2B products, the bucket lands at 30–45% of volume. If yours is higher than 50%, the labels are too loose — re-audit.
-
02train
Train Fin on resolved tickets, not the FAQ.
In Intercom: Fin → Knowledge → Add content. Do NOT import the public FAQ. The FAQ is what marketing thinks customers ask. Use 200 actual resolved tickets from your deflection bucket.
For each ticket, paste two things:
— The customer question, verbatim
— The diagnostic move The agent used — the actual steps that led to resolution, not the polished close paragraphThe bot needs to learn how the agent diagnosed the problem, not the canned reply. Fin will compose its own close.
-
03classifier
Build the escalation classifier.
This sits in front of Fin. Every new ticket runs through the classifier first; only the routine ones pass to the bot.
Three setup steps:
— Pull 50 tickets from your last month, label each by hand:
routine/urgent/on_fire. Save tolabels.csv.
— Build a 30-line Anthropic prompt with the three buckets defined, the 50 labels as examples, and a hard JSON schema:{ severity, route, confidence, reasoning }. Full template is in guide № 009.
— Wire it as a webhook handler that fires on every new Intercom ticket. Ifroute ≠ "bot", route to a human queue and skip Fin entirely.The classifier is the gate. The bot is the helper. This single architectural choice is what keeps NPS flat.
-
04signals
Three handoff signals.
Hand the ticket to a human the instant any one fires. Each signal is independent and instant — no confidence threshold magic. The customer never has to ask twice.
01 · Explicit ask"human" or "agent"
Verbatim or close. Match also: "real person," "someone," "call us."
02 · Sentiment drifttwo notches in three messages
Negative slope. Doesn't matter if the customer is being polite about it.
03 · Repeat questionasked twice
Semantic match, not literal. If they're re-asking, the bot's answer didn't land.
-
05audit
Weekly audit — twenty tickets.
Every week, pull 20 random "solved by bot" tickets. Read them. The question: did the customer get what they needed? Not "did the conversation close."
The audit catches drift before NPS does. Most teams skip it. That's the entire story of why CS bots end up with bad reputations.
What we stopped doing.
- ×Letting the bot handle everything by default. The classifier is the gate, not a suggestion.
- ×Measuring deflection without NPS in parallel. One number alone is a lie.
- ×Onboarding on the FAQ. Onboard on resolved tickets.
- ×Hiding the handoff. Show the customer when they're being routed and why. Trust climbs.
- ×Re-training weekly. The classifier is stable. The deflection bucket isn't. Re-tag tickets, don't retrain.
- ×Closing "solved" tickets without a follow-up question. One short check-in 24 hours later finds the silent failures.
The take.
Good deflection is a hand-off discipline, not a bot capability. The classifier decides. The bot helps. The human takes everything else, fast. NPS doesn't move because customers never feel pushed.
If you steal one thing — make it the weekly audit of 20 tickets. The number that matters is whether the resolution actually landed, not whether the ticket closed.
Don't touch these until you have 30 days of clean deflection and NPS data. They earn their place once you trust the baseline.
Sentiment drift as a second handoff signal.
Run a sentiment pass on every customer message in the conversation. Track the slope over the last three turns. Negative slope and any escalation keyword fires an immediate human handoff — before the customer asks.
This catches the quiet frustration that turns into a one-star review three weeks later.
Customer-tier weighting in the classifier.
Pass account ARR and contract tier into the classifier prompt. Enterprise accounts get a lower bar for "urgent." This is not unfair — it's contractual. Stay honest by logging the weight in the trace.
Topic clustering on incoming tickets.
Weekly: cluster the last 7 days of new tickets by topic. Any cluster that's more than 10% of volume and isn't in the deflection bucket is your next docs project — not the bot's. Fix the product or the docs, then expand the bucket.
Shadow-mode the bot on new categories.
Before expanding the deflection bucket, run the bot in shadow mode on the candidate category for two weeks — it answers, but the answer goes to a human reviewer, not the customer. Promote to production only after 90% agreement.
Bot-bias audit, quarterly.
Compare deflection rate by customer tier, region, and product line. If the bot is deflecting more aggressively on free-tier users than enterprise, that's a problem the metrics won't surface — but the audit will.
Deflection failures hide. NPS catches them six months late. Five symptoms that catch them faster, with the cause underneath and the fix.
№ 01NPS dropped five points or more.+
№ 02Same ticket re-opens repeatedly.+
№ 03Deflection rate plateaued.+
№ 04Frustration spikes in single threads.+
№ 05Pylon Slack queue grew silently.+
Three drop-ins. The classifier prompt, the handoff signal definitions, the weekly audit query. Paste them and you're shipping inside a week.
The escalation classifier prompt.
30-line prompt, JSON output. Wire it as the first call on every ticket.
You are a customer-service triage classifier. Given a ticket
and the customer's account tier, output JSON in this exact
shape — nothing else:
{
"severity": "routine" | "urgent" | "on_fire",
"route": "bot" | "human" | "engineering",
"confidence": 0.0 - 1.0,
"reasoning": "<one sentence, max 20 words>"
}
RULES
─────
- "on_fire" if: production down, billing dispute over $1k,
legal language, security report, or explicit "need human now."
- "urgent" if: enterprise tier (ARR > $50k/yr), churn risk
signal, or repeat ticket within 24 hours.
- "routine" otherwise.
- "route" follows from severity:
on_fire -> engineering
urgent -> human
routine -> bot
- If unsure, route to human. Never to bot when unsure.
TICKET
──────
{{ticket_body}}
ACCOUNT TIER
────────────
{{tier}} · ARR {{arr_usd}} · age {{account_age_days}}d
The three handoff signals.
Run on every incoming customer message in a bot conversation. Any one fires hands the ticket to a human.
// signals.ts — run on every customer message
export const handoffSignals = (history) => {
const last = history.at(-1).text.toLowerCase();
const reasons = [];
// 01 — explicit ask
if (/\b(human|agent|person|real (someone|one|person))\b/.test(last)) {
reasons.push("explicit_ask");
}
// 02 — sentiment drift (negative slope over last 3 customer turns)
const last3 = history.filter(m => m.from === "customer").slice(-3);
const slope = sentimentSlope(last3); // -1.0 to 1.0
if (slope <= -0.3) reasons.push("sentiment_drift");
// 03 — repeat question (semantic similarity, not literal)
const prevCustomerMsgs = history
.filter(m => m.from === "customer")
.slice(0, -1)
.map(m => m.text);
if (semanticMatch(last, prevCustomerMsgs, 0.85)) {
reasons.push("repeat_question");
}
return { handoff: reasons.length > 0, reasons };
};
The weekly audit query.
Pull 20 random "solved by bot" tickets each week. SQL against your warehouse — Postgres flavor shown.
-- weekly_audit.sql — run on the weekly audit SELECT t.id, t.subject, t.customer_email, t.tier, t.resolved_at, t.resolution_text, c.transcript_url, fb.score AS csat_after_resolve, fb.comment AS csat_comment FROM tickets t JOIN conversations c ON c.ticket_id = t.id LEFT JOIN feedback fb ON fb.ticket_id = t.id WHERE t.resolved_by = 'bot' AND t.resolved_at >= now() - interval '7 days' ORDER BY random() LIMIT 20; -- For each row: read the transcript. Did the customer get -- what they needed? Not did the conversation close. -- Tag the row in the audit_log table as: landed | drifted | fake.
Need this done for you? The author works on this exact thing with audit clients at austinaiguy.com.