On April 30, 2026, between approximately 15:29 UTC and 17:00 UTC, customers in Prod3 experienced degradation impacting delegate connectivity, instance synchronization, pipeline executions, and connector operations due to spike in load on one of our services. .
Service stability was restored through service scaling, infrastructure capacity increases, and database resource expansion.
Customer Impact:
Duration:
The incident was caused by spike causing thread exhaustion and elevated request contention between internal services during a period of increased synchronization and delegate activity.
The following actions were taken to restore service stability:
Services recovered progressively beginning at approximately 15:47 UTC, with full stability restored by ~17:00 UTC.
To prevent such issues from happening again, We are implementing the following improvements