On 17th April, between 10:42 AM UTC - 11:12 AM UTC, customers experienced intermittent errors when trying to access app3.harness.io on our Prod-3 cluster. The issue was caused by a configuration change on a failover cluster in the backend ingress-controller service setup associated with app3.harness.io.
Our monitoring system alerted us to the issue , we identified and reverted the change to mitigate the issue which restored all the functionality in Prod-3 cluster.
As part of preparation work for a planned Disaster Recovery (DR) activity, we introduced a new configuration in the Prod-3 cluster. This change unintentionally made the Prod-3 DR environment eligible to receive live customer traffic. Since this environment was not fully operational some of the requests were returned with 503 Errors.