PROD3: UI not loading

Incident Report for Harness

Postmortem

Summary

On April 1st 2025, the Prod-3 cluster experienced performance degradation, resulting in access check API calls timing out intermittently. The impact was traced to degraded performance in the underlying MongoDB database, which is critical for access control validation.

Resolution

Scaled up the database cluster to address the issue

RCA

The issue was caused by a temporary degraded performance in our database, which handles access validation for API calls. A memory optimization activity briefly reduced system capacity, and during this window, traffic increased unexpectedly, leading to a delay in the system scaling back to full performance. As a result, some access check operations experienced timeouts, impacting overall request performance.

Action Items

  • Utilize database cluster scaleup to address any memory fragmentation issue.
  • Improve query and index optimization for better database efficiency.
  • Delete stale data to reduce memory usage.
  • Optimize retry mechanisms to avoid overwhelming the system during failures
Posted Apr 23, 2025 - 10:35 PDT

Resolved

This incident has been resolved.
Posted Apr 01, 2025 - 00:41 PDT

Monitoring

A fix has been implemented and we are monitoring.
Posted Apr 01, 2025 - 00:05 PDT
This incident affected: Prod 3 (Continuous Delivery (CD) - FirstGen - EOS, Continuous Delivery - Next Generation (CDNG), Cloud Cost Management (CCM), Continuous Error Tracking (CET), Continuous Integration Enterprise(CIE) - Self Hosted Runners, Continuous Integration Enterprise(CIE) - Mac Cloud Builds, Continuous Integration Enterprise(CIE) - Windows Cloud Builds, Continuous Integration Enterprise(CIE) - Linux Cloud Builds, Custom Dashboards, Feature Flags (FF), Security Testing Orchestration (STO), Service Reliability Management (SRM), Chaos Engineering, Internal Developer Portal (IDP), Infrastructure as Code Management (IaCM), Software Supply Chain Assurance (SSCA), Software Engineering Insights (SEI), Code Repository).