CI Pipelines which had step group in them started to fail for customers with plan creation failure error.
Each module needs to register with the pipeline service on the steps/step groups/stages it can execute. If the registration is not done, the pipeline service broadcasts events to identify the service which can execute the steps/step groups/stages. The first service which acknowledges the event is provided the handle to execute the steps/step groups/stages. During the incident new module, IACM was deployed to production. When the next execution for step group came in, the pipeline service broadcasted an event to identify the service to execute the step group. Since IACM responded first it was given control to execute the step group. Further IACM did not have the context to execute CI step group causing the failure.
All times are in PDT on March 14, 2023.
10:14 - We got the P1 issue from the customer regarding pipeline failures and the Harness status page has been updated accordingly.
10:16 - The issue has been identified and started working on fixing the issue.
10:25 - Fix has been implemented and started monitoring the result.
10:29 - The incident has been resolved in the Harness status page after validating that the issue has been fixed.