Postmortem Culture means getting more actionable followups out of each Postmortem, and a very frequent action item we look for is, "where did we not have the visibility that we wanted to have?" Adding metrics, log messages, and trace points are all fine action items to take after an outage. Surely there is something which could have been more clear and reduced the time to remediate.
Observability vendors already provide help in instrumenting a codebase, such as Honeycomb's Agent skills and MCP service. I'd like learnings from a Postmortem to flow back into the tooling like a profile-driven optimization: take our handling of recent incidents into account when suggesting instrumentation. Even if the tooling might not otherwise suggest adding visibility in a particular area, if it is something we have struggled with then it is worth adding more help.