On January 9, 2026, some customers experienced intermittent errors and slow responses while accessing pipeline execution details and execution lists. The issue was identified promptly and mitigated by the engineering team. Service functionality was restored within a short period.
Impact
During the incident window, a subset of users may have encountered:
Pipeline executions themselves continued to run as expected. There was no data loss, and no long-term impact to customer environments.
The issue was caused by elevated memory usage in a subset of service instances under load. When available memory dropped below required thresholds, certain requests related to loading execution data could not be processed successfully. Because the service instances remained partially healthy, they were not immediately recycled, resulting in intermittent request failures until mitigation was applied.
We did a rolling restart of all pods. That immediately fixed the issue. As a preventive measure we also increased the pod heap
To prevent recurrence and improve resiliency, the following actions are being implemented: