How a Missing Phone Call Cost Us $8,000

February 2025. A Saturday morning. Our monitoring picked up that a client's checkout page was returning intermittent 503 errors. The alert went to Slack. The on-call person saw it, assumed it was a temporary hosting blip, and made a note to check Monday.

It wasn't temporary. The hosting provider had throttled the account due to a traffic spike from a viral TikTok post. The checkout was down for 36 hours. The client lost roughly $8,000 in sales over the weekend.

Our incident escalation procedures at the time were basically "post in Slack and hope someone acts on it." That's not a procedure. That's a suggestion.

Why Most Marketing Teams Don't Have Real Incident Escalation Procedures

Engineering teams have runbooks. They have PagerDuty rotations. They have severity levels and response time SLAs.

Marketing teams? Most of them have a group chat.

I get why. Marketing doesn't think of itself as an operational discipline. You're creative people running campaigns, not SREs managing infrastructure. But your funnel IS infrastructure. It's the thing that turns ad dollars into revenue. And when it breaks, the cost is just as real as a server outage.

The gap between "we got an alert" and "the right person took action" is where money dies. Good incident escalation procedures close that gap.

What Good Incident Escalation Procedures Look Like

We rebuilt ours after the $8,000 weekend. Here's the structure:

Level 1: Automated alert. Monitoring catches the issue and sends a notification. At FunnelLeaks, this goes to Slack and email simultaneously. Response expectation: acknowledge within 15 minutes during business hours, 30 minutes off-hours.

Level 2: Human triage. Someone looks at the alert and determines severity. Is it affecting live traffic? Is the funnel partially broken or completely down? Are ads currently running to the affected page? This step takes 5 minutes max.

Level 3: Fix or pause. If the issue is fixable quickly (bad deploy, DNS tweak, app conflict), fix it. If it's going to take longer, pause the ad campaigns pointing to the broken page immediately. Don't let ads keep running to a dead checkout while you troubleshoot.

Level 4: Escalate to the right person. If the on-call marketer can't fix it, there needs to be a clear path to the developer, the hosting provider, or whoever can. Phone call, not Slack. Slack messages get buried.

The 15-Minute Rule

We have one rule that's saved us more money than any tool or process: if an alert isn't acknowledged within 15 minutes, it automatically escalates. Not to a manager. To a phone call.

An automated phone call through a service like PagerDuty or even a simple Twilio integration will get someone's attention in a way that a Slack notification never will. Especially on a Saturday morning when your team is at brunch.

Since we added the 15-minute escalation rule, our average response time dropped from 4.2 hours to 22 minutes. That's the difference between a minor hiccup and a weekend-long disaster.

Build Yours This Week

Q1 is closing out. Before you head into Q2 campaigns, sit down with your team and write out your incident escalation procedures. It doesn't need to be a 20-page document. One page is fine. Cover these four questions:

  • Who gets the alert first?
  • What do they do within 15 minutes?
  • Who do they call if they can't fix it?
  • At what point do you pause ad spend?

If you don't have automated monitoring yet, that's step zero. You can't escalate an incident you don't know about. FunnelLeaks handles the detection side. You handle the response.

Your funnel will break. The question is whether you'll find out in 15 minutes or 36 hours.