Harness Platform was briefly Unavailable
Incident Report for Harness
Postmortem

Summary

Customers were unable to access https://app.harness.io/ for 2 minutes.

What was the issue?

A recent deployment for the gateway component in the prod-1 environment had an incorrect configuration that downscaled all the gateway pods.

Resolution

Configuration was reverted to restore the service availability.

Time(UTC) Event
5 Nov 12:52:50 PM Service deployment downscaled the gateway pods.
5 Nov 12:54:50 PM Scaled-up gateway pods. New pods were up and running to serve traffic.

RCA

On Nov 5, 2024, for 2 minutes, users experienced an HTTP 503 (service unavailable) error when attempting to access https://app.harness.io. This occurred due to the downscaling of the gateway service. The issue originated from a recent deployment that applied an incorrect configuration. The configuration was immediately reverted to restore service availability.

Action Items

Improve Pre-Deployment Checks: Enhance pre-deployment checks to validate critical service configurations, to prevent unintended downscaling.

Posted Nov 10, 2024 - 21:08 PST

Resolved
This incident has been resolved.
Posted Nov 05, 2024 - 06:20 PST
Investigating
We would like to notify you of a disruption to the Harness Platform that took place at 12:53 PM UTC today. This was a temporary glitch, and the Platform is now operating normally. Further details regarding the precise impact and underlying cause of this disruption will be provided in a postmortem report here. We appreciate your understanding and patience.
Posted Nov 05, 2024 - 06:19 PST
This incident affected: Prod 1 (Continuous Delivery (CD) - FirstGen - EOS, Continuous Delivery - Next Generation (CDNG), Cloud Cost Management (CCM), Continuous Error Tracking (CET), Chaos Engineering, Continuous Integration Enterprise(CIE) - Self Hosted Runners, Continuous Integration Enterprise(CIE) - Mac Cloud Builds, Custom Dashboards, Feature Flags (FF), Security Testing Orchestration (STO), Service Reliability Management (SRM), Internal Developer Portal (IDP), Infrastructure as Code Management (IaCM), Software Supply Chain Assurance (SSCA)) and Prod 2 (Continuous Delivery (CD) - FirstGen - EOS, Continuous Delivery - Next Generation (CDNG), Cloud Cost Management (CCM), Continuous Error Tracking (CET), Chaos Engineering, Continuous Integration Enterprise(CIE) - Self Hosted Runners, Continuous Integration Enterprise(CIE) - Mac Cloud Builds, Custom Dashboards, Feature Flags (FF), Security Testing Orchestration (STO), Service Reliability Management (SRM), Internal Developer Portal (IDP), Infrastructure as Code Management (IaCM), Software Supply Chain Assurance (SSCA)).