Pipelines had failures due to delegates and build pods got disconnected. This also impacted our hosted CI operations and all hosted build pipelines failed.
Root Cause Analysis:
For performance improvement and agility of our development our engineering team had been making changes to get the legacy delegates which are no longer being used by customers but still running disconnected from the platform. This change was behind a feature flag which was enabled and resulted in an incident. The code change had an unintended affect of not accepting connection request from build containers. The feature flag was disabled which restored normal pipeline operations.
Action Items:
Improve feature flag operations: Our engineering team operated in silo while enabling this feature which resulted in wrong implementation and ineffective rollout of this functionality. We are improving out process internally to templatize and manage feature flag rollout by external operations team.
Improve Automation: Adding steps in our QA process to catch the dependency of delegates and build pods connection requests so any change in this area is validated internally.
Posted Mar 24, 2025 - 09:58 PDT
Resolved
This incident has been resolved.
Posted Mar 18, 2025 - 10:30 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Mar 18, 2025 - 10:25 PDT
Identified
The issue has been identified and a fix is being implemented.
Posted Mar 18, 2025 - 10:05 PDT
Investigating
We are currently investigating this issue.
Posted Mar 18, 2025 - 09:56 PDT
This incident affected: Prod 1 (Continuous Integration Enterprise(CIE) - Mac Cloud Builds, Continuous Integration Enterprise(CIE) - Windows Cloud Builds, Continuous Integration Enterprise(CIE) - Linux Cloud Builds).