Incident Summary:
The Chaos dashboard on the Prod2 cluster was inaccessible to users, returning a 404 error code when attempting to access it.
Timeline
Time (UTC) | Event |
---|---|
November 5, 6:58 AM | An alert was triggered. |
November 5, 7:08 AM | We identified that the issue was related to ingress. |
November 5, 7:16 AM | The ingress class issue caused by the helm migration was fixed and started monitoring. |
November 5, 7:19 AM | The incident has been successfully resolved. |
Root Cause Analysis:
During the Helm switchover, all module ingresses were duplicated under a new Ingress Class. Automation was created to ensure that each service's ingress rule was duplicated. However, the Chaos ingress rules followed an older method of implementing the Ingress Class which uses annotations, which was missed by the automation.
Immediate Resolution:
The Chaos ingress rules were recreated with valid Ingress Class.
Action Items: