Controller would be required to initialise the VM’s and manage the state of the VM’s, it being down means the Mac pipelines were not functional.
Post debugging we identified that there were network configurations issue that had to be re-configured to ensure the controller was accessible.
Timeline:
Time
Event
22nd Feb 2024 8:13 AM IST
We started deployment for Controller to fix issue
22nd Feb 2024 8:24 AM IST
We noticed Controller was not coming up, Hence started revert of release.
22nd Feb 2024 8:30 AM IST
Revert did not work here, hence Internally incident was created and an investigation started.
22nd Feb 2024 12:11 PM IST
Dlite deployment - bounce done with new network changes on all cluster.
Resolution:
We resolved network configurations related issues for controller, further it was accessible.
RCA & Action Items:
As part of the improvements, we will be moving this to a high availability setup. We will also be updating the alerting and monitoring around this workflow to capture such issues immediately.
Posted Feb 26, 2024 - 06:55 PST
Resolved
This incident has been resolved.
Posted Feb 21, 2024 - 22:44 PST
Monitoring
A fix has been implemented and we are monitoring the environment.
Posted Feb 21, 2024 - 22:39 PST
Identified
We have identified the issue and are working on a fix now.
Posted Feb 21, 2024 - 19:12 PST
Investigating
We are currently investigating this issue
Posted Feb 21, 2024 - 19:03 PST
This incident affected: Prod 3 (Continuous Integration Enterprise(CIE) - Mac Cloud Builds), Prod 1 (Continuous Integration Enterprise(CIE) - Mac Cloud Builds), and Prod 2 (Continuous Integration Enterprise(CIE) - Mac Cloud Builds).