№ 003customer service · deflection filed may '26

Deflection without destroying NPS.

The three signals that tell you the bot is making things worse — and what to do before they show up in the next survey.

Mei L. [Club Member] · 8 min read · ~38% deflection, NPS flat

Most deflection bots lose NPS by deflecting tickets they can't actually solve. The fix is structural: put a classifier in front of the bot that routes the hard tickets to humans before the bot ever sees them.

What you'll have when you finish: a deflection layer that handles ~38% of incoming tickets without dropping NPS, gated by a 30-line classifier prompt, three instant handoff signals, and a weekly 20-ticket audit.

Accounts you'll need: intercom.com/fin · plain.com · console.anthropic.com. Pylon and Stonly are optional and only needed once the basics are running.

The stack — five tools, ranked.

01Intercom Fin — deflection layerdaily
02Plain — B2B routing & ownershipdaily
03Anthropic API — escalation classifierdaily
04Pylon — Slack-connected accountsweekly
05Stonly — guided flows for repeat issuesmonthly

Fin does the deflection. Plain holds the ticket and routes the rest. The classifier decides which is which. Pylon and Stonly are how you handle the long tail.

How to apply it.

01bucket

Define the deflection bucket.

Export the last 60 days of tickets to CSV. Add a column called resolution_type. Tag every row with one of four values:

— docs — answered by linking to existing documentation
— faq — answered with a canned/templated response
— human — needed an agent to diagnose
— escalated — needed engineering or leadership

The deflection bucket is docs + faq. That's the bot's territory. Everything else routes to a person.

For most B2B products, the bucket lands at 30–45% of volume. If yours is higher than 50%, the labels are too loose — re-audit.
02train

Train Fin on resolved tickets, not the FAQ.

In Intercom: Fin → Knowledge → Add content. Do NOT import the public FAQ. The FAQ is what marketing thinks customers ask. Use 200 actual resolved tickets from your deflection bucket.

For each ticket, paste two things:

— The customer question, verbatim
— The diagnostic move The agent used — the actual steps that led to resolution, not the polished close paragraph

The bot needs to learn how the agent diagnosed the problem, not the canned reply. Fin will compose its own close.
03classifier

Build the escalation classifier.

This sits in front of Fin. Every new ticket runs through the classifier first; only the routine ones pass to the bot.

Three setup steps:

— Pull 50 tickets from your last month, label each by hand: routine / urgent / on_fire. Save to labels.csv.
— Build a 30-line Anthropic prompt with the three buckets defined, the 50 labels as examples, and a hard JSON schema: { severity, route, confidence, reasoning }. Full template is in guide № 009.
— Wire it as a webhook handler that fires on every new Intercom ticket. If route ≠ "bot", route to a human queue and skip Fin entirely.

The classifier is the gate. The bot is the helper. This single architectural choice is what keeps NPS flat.
04signals

Three handoff signals.

Hand the ticket to a human the instant any one fires. Each signal is independent and instant — no confidence threshold magic. The customer never has to ask twice.

01 · Explicit ask
"human" or "agent"

Verbatim or close. Match also: "real person," "someone," "call us."

02 · Sentiment drift
two notches in three messages

Negative slope. Doesn't matter if the customer is being polite about it.

03 · Repeat question
asked twice

Semantic match, not literal. If they're re-asking, the bot's answer didn't land.
05audit

Weekly audit — twenty tickets.

Every week, pull 20 random "solved by bot" tickets. Read them. The question: did the customer get what they needed? Not "did the conversation close."

The audit catches drift before NPS does. Most teams skip it. That's the entire story of why CS bots end up with bad reputations.

What we stopped doing.

×Letting the bot handle everything by default. The classifier is the gate, not a suggestion.
×Measuring deflection without NPS in parallel. One number alone is a lie.
×Onboarding on the FAQ. Onboard on resolved tickets.
×Hiding the handoff. Show the customer when they're being routed and why. Trust climbs.
×Re-training weekly. The classifier is stable. The deflection bucket isn't. Re-tag tickets, don't retrain.
×Closing "solved" tickets without a follow-up question. One short check-in 24 hours later finds the silent failures.

The take.

Good deflection is a hand-off discipline, not a bot capability. The classifier decides. The bot helps. The human takes everything else, fast. NPS doesn't move because customers never feel pushed.

If you steal one thing — make it the weekly audit of 20 tickets. The number that matters is whether the resolution actually landed, not whether the ticket closed.

Don't touch these until you have 30 days of clean deflection and NPS data. They earn their place once you trust the baseline.

Sentiment drift as a second handoff signal.

Run a sentiment pass on every customer message in the conversation. Track the slope over the last three turns. Negative slope and any escalation keyword fires an immediate human handoff — before the customer asks.

This catches the quiet frustration that turns into a one-star review three weeks later.

Customer-tier weighting in the classifier.

Pass account ARR and contract tier into the classifier prompt. Enterprise accounts get a lower bar for "urgent." This is not unfair — it's contractual. Stay honest by logging the weight in the trace.

Topic clustering on incoming tickets.

Weekly: cluster the last 7 days of new tickets by topic. Any cluster that's more than 10% of volume and isn't in the deflection bucket is your next docs project — not the bot's. Fix the product or the docs, then expand the bucket.

Shadow-mode the bot on new categories.

Before expanding the deflection bucket, run the bot in shadow mode on the candidate category for two weeks — it answers, but the answer goes to a human reviewer, not the customer. Promote to production only after 90% agreement.

Bot-bias audit, quarterly.

Compare deflection rate by customer tier, region, and product line. If the bot is deflecting more aggressively on free-tier users than enterprise, that's a problem the metrics won't surface — but the audit will.

Deflection failures hide. NPS catches them six months late. Five symptoms that catch them faster, with the cause underneath and the fix.

№ 01NPS dropped five points or more.+

SymptomQuarterly survey shows real decline. No other obvious cause.

CauseBot started handling tickets outside the deflection bucket. Usually a classifier prompt change.

FixRoll back the classifier prompt. Re-audit the deflection bucket. Don't expand again for 30 days.

№ 02Same ticket re-opens repeatedly.+

SymptomCustomer keeps coming back with the same question. Bot keeps "solving" it.

CauseFake resolution. The bot is closing the loop without addressing the actual issue.

FixAdd the "asked twice" handoff signal if you don't have it. Audit the resolution path for that ticket type.

№ 03Deflection rate plateaued.+

SymptomStuck at 30% when 38% was the target. Hasn't moved in two months.

CauseTraining data went stale. New product features changed what customers ask about.

FixRe-tag the last 60 days of tickets. Expand the deflection bucket with the new resolved ones. Shadow-mode new categories before promoting.

№ 04Frustration spikes in single threads.+

SymptomCSAT survey on individual tickets has 1-star outliers. Reviews mention "the bot."

CauseThe sentiment handoff signal isn't firing fast enough.

FixTighten the threshold. Two negative messages in a row instead of three. Or any swear word, ever.

№ 05Pylon Slack queue grew silently.+

SymptomEnterprise Slack-connected accounts piling up in Pylon. Response time slipping.

CauseRouting rule pointed at the wrong queue. Or the classifier marked an enterprise account as routine.

FixPass account tier into the classifier as a top-level input, not a context field. Test with five enterprise sample tickets.

Three drop-ins. The classifier prompt, the handoff signal definitions, the weekly audit query. Paste them and you're shipping inside a week.

The escalation classifier prompt.

30-line prompt, JSON output. Wire it as the first call on every ticket.

You are a customer-service triage classifier. Given a ticket
and the customer's account tier, output JSON in this exact
shape — nothing else:

{
  "severity": "routine" | "urgent" | "on_fire",
  "route":    "bot" | "human" | "engineering",
  "confidence": 0.0 - 1.0,
  "reasoning": "<one sentence, max 20 words>"
}

RULES
─────
- "on_fire" if: production down, billing dispute over $1k,
  legal language, security report, or explicit "need human now."
- "urgent" if: enterprise tier (ARR > $50k/yr), churn risk
  signal, or repeat ticket within 24 hours.
- "routine" otherwise.
- "route" follows from severity:
   on_fire   -> engineering
   urgent    -> human
   routine   -> bot
- If unsure, route to human. Never to bot when unsure.

TICKET
──────
{{ticket_body}}

ACCOUNT TIER
────────────
{{tier}} · ARR {{arr_usd}} · age {{account_age_days}}d

The three handoff signals.

Run on every incoming customer message in a bot conversation. Any one fires hands the ticket to a human.

// signals.ts — run on every customer message

export const handoffSignals = (history) => {
  const last = history.at(-1).text.toLowerCase();
  const reasons = [];

  // 01 — explicit ask
  if (/\b(human|agent|person|real (someone|one|person))\b/.test(last)) {
    reasons.push("explicit_ask");
  }

  // 02 — sentiment drift (negative slope over last 3 customer turns)
  const last3 = history.filter(m => m.from === "customer").slice(-3);
  const slope = sentimentSlope(last3);   // -1.0 to 1.0
  if (slope <= -0.3) reasons.push("sentiment_drift");

  // 03 — repeat question (semantic similarity, not literal)
  const prevCustomerMsgs = history
    .filter(m => m.from === "customer")
    .slice(0, -1)
    .map(m => m.text);
  if (semanticMatch(last, prevCustomerMsgs, 0.85)) {
    reasons.push("repeat_question");
  }

  return { handoff: reasons.length > 0, reasons };
};

The weekly audit query.

Pull 20 random "solved by bot" tickets each week. SQL against your warehouse — Postgres flavor shown.

-- weekly_audit.sql — run on the weekly audit

SELECT
  t.id,
  t.subject,
  t.customer_email,
  t.tier,
  t.resolved_at,
  t.resolution_text,
  c.transcript_url,
  fb.score        AS csat_after_resolve,
  fb.comment      AS csat_comment
FROM tickets t
JOIN conversations c   ON c.ticket_id = t.id
LEFT JOIN feedback fb  ON fb.ticket_id = t.id
WHERE t.resolved_by = 'bot'
  AND t.resolved_at >= now() - interval '7 days'
ORDER BY random()
LIMIT 20;

-- For each row: read the transcript. Did the customer get
-- what they needed? Not did the conversation close.
-- Tag the row in the audit_log table as: landed | drifted | fake.

Related stack The support stack →

Next in the library Browse all 12 guides →

Need this done for you? The author works on this exact thing with audit clients at austinaiguy.com.

Deflection without destroying NPS.

The stack — five tools, ranked.

How to apply it.

Define the deflection bucket.

Train Fin on resolved tickets, not the FAQ.

Build the escalation classifier.

Three handoff signals.

"human" or "agent"

two notches in three messages

asked twice

Weekly audit — twenty tickets.