Support & On-Call

Source: content/manual/04-platform-engineering/chapters/06-support-and-oncall.md

Purpose and scope

Define how teams get help, who is paged, and how incidents are resolved.

Outcomes

  • Clear intake and triage.
  • SLAs for capabilities and incidents.
  • Faster MTTR via runbooks.

Signals of trouble

  • Ping-pong escalations across teams.
  • Unclear ownership during incidents.
  • Runbooks missing or outdated.

Remediation steps

  1. Publish support tiers and on-call rotations.
  2. Maintain runbooks; practice game days.
  3. Integrate post-incident actions into the roadmap.

Checklists and assets

References

  • Incident policy; paging procedures.