Pipeline executions were getting queued for multiple customers with the message "Current execution is queued as another execution is running with a given resource key".
Pipelines scheduled for execution were experiencing prolonged queuing delays. In certain cases, pipelines remained in the queued state long enough to eventually expire. This behavior impacted deployment pipelines as well as other pipelines incorporating a queue step, leading to execution delays and timeouts.
We found that a large number of resource restraint entries were created during pipeline runs. This buildup caused a backlog, which slowed down new pipeline processing. To mitigate the issue, we manually drained the queue. We also added capacity to help handle the load better and avoid the problem in the future.
Harness pipelines leverage resource restraint instances to control the number of concurrent pipeline executions. During the incident, an unexpected spike in load triggered the creation of significantly more instances than usual. As these are processed in the background at scheduled intervals, the sudden surge led to processing delays, causing pipelines to queue and resulting in slower execution times.
Action Items