The standard rollback procedure referenced from Change Management → Rollback. Use this when a deploy fails, a regression is detected, or monitoring alerts indicate a customer-impacting change.

Decision: rollback or roll-forward

Default to rollback if any of the following are true:
  • The error rate, latency, or saturation alert threshold has been exceeded.
  • A correctness defect is producing or could produce wrong customer-facing output.
  • The change involves a destructive or non-idempotent operation that is likely to compound on retry.
  • The on-call engineer has not yet identified a root cause.
Choose roll-forward only when the fix is small, well-understood, already prepared, and the rollback path itself is risky (e.g., a database migration that cannot be cleanly reversed). The decision is owned by the on-call engineer; the CTO is informed if the rollback affects customer-impacting services.

Rollback steps

1. Stop the bleed

  • Page on-call if you are not the on-call engineer — file the Better Stack incident-reporting form (production emergencies only).
  • Disable the failing surface if a feature flag exists.
  • Drain traffic from the failing instance / region if possible.
  • Communicate in #engineering-incidents — short message, time-stamped, no speculation.

2. Revert the code

Production deploys go through GitHub Actions from master. To roll back:
  1. git revert <merge-sha> on master (do not force-push or delete the original commit — the revert is its own commit so the history stays auditable).
  2. Open a “Revert: …” PR. Expedited review under the same Code Review rules; the on-call engineer can self-approve if no other reviewer is available.
  3. Merge. The standard GitHub Actions deploy pipeline rolls the revert into production.
  4. Monitor recovery via the same alert dashboard that flagged the issue.

3. Schema / data changes

If the change includes a database migration:
  • Forward-compatible migrations (additive columns, new tables) — the revert is safe because the old code ignores the new schema.
  • Backwards-incompatible migrations (column rename, drop, type change) — do not run a reverse migration as part of rollback. Roll forward with a hotfix that tolerates the new schema, or isolate the failing surface behind a flag while the fix is prepared.
  • For any migration touching customer data, the CTO and the on-call DBA approve the rollback path before execution.

4. Secrets and credentials

If rollback restores a previous credential surface (e.g., the change rotated a secret), confirm with the Secrets Management procedure that the previous credential is still valid; if not, rotate forward to a fresh credential rather than rolling back.

5. Customer comms

For changes affecting customers, post a status-page update per Customer Communications. The on-call lead drafts; the CTO or CISO approves before posting.

6. Post-rollback retrospective

Open a retrospective entry via the Emergency Change Retro intake within 24 hours. Document: timeline, blast radius, the trigger, the rollback action taken, and follow-ups (test gaps, monitoring gaps, runbook updates).

When the change cannot be rolled back

If rollback is not feasible (e.g., destructive migration, irreversible third-party API call, customer-side state change), treat this as an incident under the Incident Response Plan and escalate to the CISO and CTO. The IR plan governs from that point.

Cross-references

Version history

VersionDateDescriptionAuthorApproved by
1.0May 8, 2026Initial versionCameron WolfeIshan Jadhwani