Runbook: CA Region Bidding Outage

1. Overview

This runbook describes the procedure to diagnose and recover a bidding outage in the CA (Canada) region.

Since Canada is currently the most critical region in the infrastructure, it is used as the primary reference example throughout this document.

2. Symptoms

  • No log entries with the CA prefix

  • High volume of HTTP 503 errors

3. Procedure

3.1. Step 1 — Check Logs (Confirm the Issue)

Run:

tail -f -n 1000 /var/log/dsp/backend.log | grep -i fuse

Problem indication: You do not see any entries with the CA prefix.

3.2. Step 2 — Restart CA Bidder Processes

On the affected machine, run:

supervisorctl restart checking_metrics_accounting_fuse_ca save_incoming_rtb_metrics_ca update_limits_and_balances_ca

3.3. Step 3 — Wait Before Verification

The CA state calculation is performed every 60 seconds:

INFO [dsp.backend.management.commands.checking_metrics_accounting_fuse]
Calculate Total State of CA every 60 seconds

Wait at least 60 seconds before checking logs again.

3.4. Step 4 — Verify Recovery (CA Prefix Appears)

Run the log command again:

tail -f -n 1000 /var/log/dsp/backend.log | grep -i fuse

Expected result: Entries with the CA prefix start appearing again, confirming that CA bidding is alive.

4. Notes

If CA log entries do not reappear after the restart and waiting at least 60 seconds, escalate and investigate underlying causes (resource exhaustion, connectivity issues, etc.).