The 'Harness Overview Page' failed to load intermittently in the Prod-2 cluster while the critical functionality remained unaffected.
Time (UTC) | Event |
---|---|
01:33 PM | Received internal alerts for synthetic monitoring failure. |
01:47 PM | An internal incident was raised. |
01:48 PM | The root cause was identified. |
02:16 PM | Incident was resolved |
The high CPU-intensive queries were terminated in the backend database to resume normal operations.
The Overview dashboard failed to retrieve data from the backend database as the CPU utilisation reached critical levels resulting in significant delays in processing regular queries. We have isolated configurations within the database which in combination with the application’s retry mechanism lead to undue load on the database server leading to an unhealthy state.