On April 1st 2025, the Prod-3 cluster experienced performance degradation, resulting in access check API calls timing out intermittently. The impact was traced to degraded performance in the underlying MongoDB database, which is critical for access control validation.
Scaled up the database cluster to address the issue
The issue was caused by a temporary degraded performance in our database, which handles access validation for API calls. A memory optimization activity briefly reduced system capacity, and during this window, traffic increased unexpectedly, leading to a delay in the system scaling back to full performance. As a result, some access check operations experienced timeouts, impacting overall request performance.