Prod 2: NG Dashboards are not loading as expected
Incident Report for Harness
Postmortem

Impact:
Harness custom dashboards were intermittently inaccessible during the time of the incident.

Root Cause:

Custom Dashboards microservice in the harness cluster of services had experienced a server overloaded with requests causing a CPU spike resulting in a crash and a restart and due to the high volume of requests the UI was inaccessible.

Timeline:

Time Event
May 16 at 12:32 PM PST Internal team noticed slowness in custom dashboards
May 16 at 12:32 PM PST Internal call was initiated and engineering started triaging
May 16 at 12:41 PM PST Functionality returned back to normal after a pod restart

Action items:

Adopting the Gunicorn server to handle more requests per second and added resiliency

Posted May 18, 2023 - 20:51 PDT

Resolved
We can confirm normal operation. Get Ship Done!
We will continue to continue to monitor and ensure stability.
Posted May 16, 2023 - 13:15 PDT
Monitoring
Harness service issues have been addressed and normal operations have been resumed. We are monitoring the service to ensure normal performance continues.
Posted May 16, 2023 - 12:41 PDT
Investigating
Harness NG Dashboards in Prod 2 cluster are unable to load. We are working to identify the cause and restore normal operations as soon as possible.
Posted May 16, 2023 - 12:32 PDT
This incident affected: Prod 2 (Continuous Delivery - Next Generation (CDNG), Cloud Cost Management (CCM), Continuous Integration Enterprise(CIE) - Cloud Builds, Feature Flags (FF), Security Testing Orchestration (STO)).