On Monday, March 31, 2025, at 6:53 PM UTC, some customers experienced authentication issues, including getting logged out of Harness. This incident affected users with accounts hosted on our Prod-4 cluster. This issue was resolved by 7:06 PM UTC, resulting in approximately 13 minutes of downtime.
Our engineering team identified the issue and took immediate action:
The incident was caused by a routing configuration error in our Global Gateway service. During a planned deployment, a change to our routing logic inadvertently prevented requests from being correctly directed to the Prod-4 cluster. As a result, authentication sessions for affected customers could not be appropriately maintained.
To prevent similar incidents in the future, we are implementing the following improvements: