Runbooks

Table of Contents

1. Purpose

This directory contains operational runbooks for incident response and service recovery procedures.

Runbooks describe clear, step-by-step actions to diagnose and resolve production issues in DSP infrastructure.

These documents are intended for on-call engineers and system administrators.

Runbooks should cover:

Runbooks must NOT duplicate architectural documentation. System design details belong in docs/architecture/.

Each runbook should follow a consistent structure:

Clarity and precision are critical. Runbooks must be usable under incident pressure.

File names should:

Examples:

Avoid region-specific duplication unless procedures differ significantly.

Runbooks must:

If procedures differ by region or environment, use parameterized placeholders (e.g. <REGION>).

Runbooks must be updated whenever: