The UI is not loading in Prod-3 after the deployment.
Incident Report for Harness
Postmortem

After deployment on the prod-3 cluster, NextGenUI got stuck on the initial loading screen. The issue was observed immediately during post-deployment sanity.

We identified the problem to be with our required static resources failing to load. This release included a change to how we build and load the UI for different environments. The change involved making the source for static-files configurable per-environment. But an incompatible configuration for the prod-3 cluster prevented the correct URL from being formed, resulting in 404 for our JS resources.

We mitigated the incident by updating the service configuration for this environment and re-deploying the Nextgen UI service. With the new configuration, the UI service was able to generate the correct URLs, and the issue was resolved.

Timeline

Time (UTC) Event
12:44 AM Incident was first detected after the new deployment. An internal incident was raised, and the team started looking into the issue.
12:46 AM Root cause identified and the fix was deployed.
12:47 AM Incident resolved

Action Items

  • We are auditing the service configurations for all environments with an aim to minimize the differences.
  • Improve the Nextgen UI build process to handle incompatible configurations.
Posted Jan 31, 2024 - 13:00 PST

Resolved
The incident has been resolved. We will provide a postmortem once we have gathered all the details.
Posted Jan 29, 2024 - 16:56 PST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jan 29, 2024 - 16:47 PST
Investigating
We are currently investigating this issue.
Posted Jan 29, 2024 - 16:44 PST
This incident affected: Prod 3 (Continuous Delivery (CD) - FirstGen - EOS, Continuous Delivery - Next Generation (CDNG), Cloud Cost Management (CCM), Continuous Error Tracking (CET), Continuous Integration Enterprise(CIE) - Cloud Builds, Continuous Integration Enterprise(CIE) - Self Hosted Runners, Custom Dashboards, Feature Flags (FF), Security Testing Orchestration (STO), Service Reliability Management (SRM), Chaos Engineering).