Between 4:20 PM and 5:16 PM EST on Thursday, March 5, 2026, customers using the Harness Code modules experienced a production outage in Harness production clusters Prod1, Prod2, and Prod3. Git repositories were unreachable during this outage.
We experienced a surge in metrics that overwhelmed the metric collectors on the Kubernetes pods. As a result, the Git pods were impacted. The StatefulSet became unschedulable, and resizing of the metric collectors was required to remedy the situation.
All code repositories were offline during the event across all three production clusters.
Engineering increased the memory allocated to the metric collectors and redeployed the configuration. After redeployment, the Git pods were rescheduled and service was restored.
To prevent such issues from happening, we are implementing the following: