Overview
There was an issue reported by Harness customers in the Prod-2 cluster where the Project Overview dashboard was down. This solely impacted the Dashboard API for all prod-2 customers and all other critical functions remained unaffected.
Timeline
Time | Event |
---|---|
Nov 27 , 9:30 AM UTC | Issue reported on Customer Slack channels. |
Nov 27 , 9:49 AM UTC | Internally incident was acknowledged and an investigation started. |
Nov 27 , 9:57 AM UTC | Rolled back system deployment which immediately resolved the issue. |
Nov 27 , 10:08 AM UTC | Incident resolved from Harness Status Page. |
Resolution
The latest deployment was rolled back to restore the Project overview dashboard within 8 minutes of the issue being reported.
Root Cause Analysis (RCA)
The issue was observed post manager service release on 27th Nov on Prod-2. The change included an enhancement to the dashboard service that was incompatible with the service that the dashboard service consumes data from. Regrettably, due to an oversight of coordination in release orchestration, the incompatibility of API contracts across these services was introduced.
Action Items
Our Architecture board will review the deployment management of inter-dependent services and services that use common libraries to avoid running into similar issues.