Isn't auto-replying to angry customers exactly the move that's going to blow up in my face one day?

Yes, if you let the system send unsupervised. No, if you treat it as a drafter. The version I run for clients never auto-sends an anger-4 or anger-5 reply, never auto-sends on an escalation flag, and never auto-sends on the first refund from a customer who's already had one. That covers roughly the 6% of emails that can actually hurt you. The other 94% — 'where is my order', 'wrong size', 'changed my mind' — are fine to draft and send with a human glance. The real failure mode isn't the system getting one wrong; it's the founder getting bored and turning off the human-in-the-loop after a quiet month.

My brand voice is genuinely sarcastic and a bit irreverent. Won't the prompt sand that off?

It will if you describe the voice. It won't if you show the voice. Paste three actual emails or captions in the voice samples block, with all the personality intact, and the model will mirror it surprisingly well — including light sarcasm. The thing it can't do safely is escalate sarcasm in response to anger, so add a single line to the hard rules: 'When anger >= 3, drop sarcasm and play it straight.' Sarcasm against a calm customer reads like the brand. Sarcasm against an angry one reads like contempt. The model can't tell the difference reliably, so you make the rule explicit.

Should I disclose to customers that an AI drafted the reply?

Controversial answer: no, and I'd argue disclosing actively hurts the customer in this case. The whole point of the system is to deliver a reply that reads like a person wrote it, on time, in your voice, with a human approving before send. That IS a human reply — the AI is doing the keyboard work, the same way spell-check or a template macro would. Disclosing 'this reply was AI-drafted' invites a fight about authenticity that isn't actually relevant to whether the refund got processed. Where I do disclose: any reply that includes a substantive policy interpretation or an emotional acknowledgement of harm. Those should come from a named human, full stop, no AI in the loop.

All posts

Marketing·June 9, 2026·9 min read

Defusing Refund Rage: A Prompt System for Boutique Stores That Won't Sound Like a Robot

A field-tested prompt stack that turns furious refund emails into calm, on-brand replies in under a minute — without the dead-eyed corporate apology voice.

Refund emails sit on a different shelf than the rest of your inbox. They arrive in caps lock at 11:47 PM, they assume bad faith, and they expect you to apologise for the postal service, the fit model, and the weather. Handling them well is one of the highest-leverage things a small store can automate. Handling them badly is one of the fastest ways to earn a one-star review and a screenshot on Reddit.

I've been running prompt-driven customer service for boutique apparel and home goods brands for two years. What follows is the exact system I hand to founders — the structure, the prompt, the guardrails, and the bits that newer practitioners almost always get wrong.

Warmly lit small boutique storefront with merchandise visible through the display window. — Boutique brands earn loyalty by sounding like a real person, not a corporate help desk.

Why most 'polite email' prompts fail

Asking ChatGPT to 'write a polite refund reply' produces something that reads like a hostage statement from a Fortune 500 PR team. You know the voice. We sincerely apologise for any inconvenience this may have caused. Your satisfaction is our top priority. Nobody talks like that. Nobody believes it. And in a boutique context — where customers chose you specifically because you weren't Amazon — it actively breaks trust.

The fix is structural, not stylistic. You need to **separate diagnosis from response**, give the model a small catalog of voice samples to mirror, and explicitly ban the phrases that make replies sound auto-generated. Skip any of those three and you'll get bland output no matter how clever your tone instructions are.

The two-stage architecture

Stage one is a classifier. It reads the inbound email and returns a small JSON object: anger level, refund eligibility, missing information, and whether escalation is needed. Stage two takes that JSON plus the original email and drafts a reply. Two prompts, two model calls, total cost roughly a third of a cent per email on a mid-tier model.

Why bother splitting it? Because the moment you ask one prompt to both judge and respond, the model hedges. It softens the diagnosis to justify a softer reply, or hardens the reply to match a harsh diagnosis. Decoupling them gives you two sharper outputs and — more importantly — a clean handoff point where a human can review before anything goes out.

What goes in each stage

Stage	Input	Output	Best for / Nuance
1. Classifier	Raw customer email + order metadata	JSON: anger (1-5), refund_status, missing_info[], escalate (bool)	Run this even on calm emails — the anger score lets you sort the queue by who actually needs a human first.
2. Drafter	Classifier JSON + original email + brand voice samples	A ready-to-send reply in your store's voice	Best for: anger 1-3. Always route anger 4-5 through a human before sending — the cost of one bad auto-reply outweighs a week of saved time.
3. (Optional) Tone audit	Drafted reply	Pass / flag with reason	Nuance: this is overkill below ~50 tickets a week. Above that, it catches the 1 in 30 reply that quietly slips into corporate voice.

The production prompt

This is the stage-two drafter — the one that actually writes the reply. The classifier prompt is shorter and I'll show it after. Drop this into ChatGPT, Claude, or your help-desk integration. The variables in curly braces are what your help-desk macro or Zapier step fills in.

text

ROLE
You are the customer service voice of {{BRAND_NAME}}, a small independent {{PRODUCT_CATEGORY}} brand. You are NOT a corporate support agent. You are a real person who cares about the product and the customer, and you've been on the other end of a bad delivery yourself.

INPUTS
- Original customer email:
"""
{{CUSTOMER_EMAIL}}
"""

- Classifier output:
{{CLASSIFIER_JSON}}

- Order facts (only state these if relevant):
Order #: {{ORDER_NUMBER}}
Item: {{ITEM_NAME}}
Ordered: {{ORDER_DATE}}
Shipped: {{SHIP_DATE}}
Return window status: {{RETURN_STATUS}}

- Voice samples (mirror the rhythm, not the wording):
"""
{{VOICE_SAMPLE_1}}

{{VOICE_SAMPLE_2}}
"""

TASK
Write a reply email of 90-160 words.

STRUCTURE (in this order, no headings)
1. Acknowledge the specific frustration in their own words (not paraphrased into corporate speak).
2. Take responsibility for the part that's actually our fault. Do not take responsibility for things outside our control (carrier delays, weather) — name them honestly instead.
3. State the concrete next action and the timeline. Be specific: 'refund hits your card in 3-5 business days' not 'we will process this shortly'.
4. Offer ONE small gesture if anger >= 3 (free shipping on next order, 15% code, hand-written note). Never offer cash or extra refunds without human approval.
5. Sign off with a real first name from this list: {{SIGNER_NAMES}}.

HARD RULES — DO NOT VIOLATE
- Banned phrases (do not use, do not paraphrase): 'we sincerely apologise', 'we apologise for any inconvenience', 'your satisfaction is our top priority', 'rest assured', 'kindly', 'as per our policy', 'unfortunately', 'we understand your frustration'.
- Banned structures: no bullet points, no numbered lists, no headers, no emojis.
- Sentence length: mix short (4-8 words) and medium (12-20 words). No sentence over 28 words.
- Contractions ON. 'We're', 'you'll', 'didn't'. Always.
- If the classifier flagged escalate=true OR refund_status='ineligible', DO NOT draft a final reply. Instead output: 'ESCALATE — reason: <one sentence>' and stop.

OUTPUT
Reply email body only. No subject line, no signature block beyond the first name.

The classifier (stage one)

This one's smaller and almost never needs tweaking once it's working. The JSON shape is what matters — keep it strict so your downstream automation doesn't break.

text

You are a triage classifier for a small ecommerce store's support inbox.

Read the customer email below and return ONLY a JSON object, no prose.

Email:
"""
{{CUSTOMER_EMAIL}}
"""

Order metadata:
{{ORDER_METADATA}}

Return this exact shape:
{
  "anger": <integer 1-5, where 1 is neutral and 5 is hostile>,
  "refund_status": "eligible" | "ineligible" | "needs_review",
  "missing_info": [<list of fields the customer didn't provide that you'd need to act, e.g. 'order_number', 'photo_of_damage'>],
  "escalate": <true if any of: legal threat, chargeback mentioned, accessibility complaint, allegation of discrimination, third refund request from same customer>,
  "summary": "<one sentence, max 20 words, in neutral voice>"
}

Tweaking the prompt for your brand

Three knobs do 90% of the work. The rest is fiddling.

1. Voice samples beat voice descriptions

Don't write 'warm, friendly, slightly sarcastic tone'. Models interpret those words wildly differently. Instead, paste two or three actual replies your founder has sent on a good day — typos preserved, em-dashes intact. The model will pattern-match the rhythm. If your founder doesn't write customer emails, mine the brand's Instagram captions. Same voice, shorter form.

2. Ban phrases, don't request tone

Every banned phrase in the prompt above is there because I caught it in production output. The list grows over time — keep a running note and add to the prompt monthly. 'Unfortunately' is the sneakiest one. It feels neutral but it primes every sentence after it to be a deflection.

3. The escalation guard is non-negotiable

The single line that says 'if escalate=true, stop and flag' is the difference between a system you can sleep at night with and one that will, eventually, send an auto-reply to a customer who mentioned a lawyer. Test this branch first. Send a fake email containing the word 'attorney' and confirm the system flags it. Do this every time you change the classifier.

What the output looks like in practice

A real example. Customer ordered a linen dress, it arrived with a snag, she emailed at midnight calling the QC team 'a joke'. Classifier returned anger=4, refund_status=eligible, escalate=false. Here's what the drafter produced — published unchanged, signed by the founder:

Hi Sara — that snag on the side seam is on us. Full stop. Our finisher missed it and the QC check missed her missing it. I'm refunding the dress to your card now (you'll see it in 3-5 business days) and you don't need to send it back. Keep it or pass it on, whichever feels better. I've also dropped a free-shipping code into your account for whenever you want to give us another shot — no pressure. Sorry this landed in your hands. — Mia

Notice what's not in there. No 'we sincerely apologise'. No 'your satisfaction is our top priority'. No bullet list of next steps. It reads like Mia wrote it on her phone, because the prompt was tuned to make it read like Mia wrote it on her phone.

Where this fits in your stack

If you're running Shopify with Gorgias, Re:amaze, or Front, both prompts can be wired in as macros or Rules that call the OpenAI/Anthropic API on inbound emails. If you're running a one-person operation out of Gmail, a single Zapier zap with two GPT-4-class steps gets you 80% of the value. The classifier output becomes a label, the drafter output becomes a draft reply you eyeball before sending. That eyeball pass takes about 8 seconds per email and catches the rare weird one.

For more on building reusable prompt assets like this, see our earlier walkthrough on the client onboarding email prompt — same structural ideas, gentler use case.

Frequently asked questions

: Yes, if you let the system send unsupervised. No, if you treat it as a drafter. The version I run for clients never auto-sends an anger-4 or anger-5 reply, never auto-sends on an escalation flag, and never auto-sends on the first refund from a customer who's already had one. That covers roughly the 6% of emails that can actually hurt you. The other 94% — 'where is my order', 'wrong size', 'changed my mind' — are fine to draft and send with a human glance. The real failure mode isn't the system getting one wrong; it's the founder getting bored and turning off the human-in-the-loop after a quiet month.

Found this useful?

Browse more free workflows — no signup, no paywall.

More prompts More guides

Written by

Dani

AI Workflow Explorer

Dani writes SoloPrompt AI — a working notebook of copy-paste prompts, low-code automations, and field-tested workflows for solo operators. Equal parts skeptic and tinkerer, Dani road-tests every prompt against real micro-business problems before it ships.

More about Dani