Overview
There was an issue reported by multiple harness customers in Prod-2 cluster where 500 errors were seen while accessing or trying to run pipelines and licensing information was also inaccessible.
Timeline
Time | Event |
---|---|
8 Jan 7:23 AM UTC | Issue was reported internally along with Customer reporting. |
8 Jan 7:23 AM UTC | Internal Incident created. |
8 Jan 7:23 AM UTC | Rolled back system deployment which immediately resolved the issue. |
8 Jan 7:28 AM UTC | Internal Incident Resolved. |
Resolution
We rolled back our latest system deployment which resolved the issue within 5 minutes of the issue being reported.
Root Cause Analysis
Post our manager service release, a change in licensing resource resulted in cache failures. License API is called to fetch license information to check entitlements of services. Addition of new fields in license resources caused failures which resulted in unhandled exceptions.
Action Item