Summary
Requests to https://events.ff.harness.io where failing for all customers in Prod1 accounts, preventing metrics data from being updated.
What was the issue?
Testing being preformed on the cluster resulted in unexpected behaviour on one of the load balancing backend instances that is used to route traffic for the Feature Flag metrics service.
Resolution
Load balancer backend was re-synced with the running applications, and the traffic resumed.
Timeline
Time(UTC) | Event |
---|---|
13 Nov 14:55 | Testing service was brought down, causing a cascading tear down of the backend instance |
13 Nov 14:56 | On call engineer alerted to the traffic errors on Prod1 |
13 Nov 15:02 | Issue identified, and team started to determine the cause |
13 Nov 15:10 | Issue was fixed, and system sync started |
13 Nov 15:14 | Traffic resumed |
RCA
On Nov 13, 2024, between 14:55 and 15:14, traffic going into the Feature Flag metrics service on Prod1 received a 500 error.
Action Items