Our hosted builds operate within a multicloud environment, utilizing virtual machines (VMs). We have implemented a fallback to address scenarios where a VM fails to initialize in a specific cloud or region. We received a notification regarding a customer pipeline experiencing initialization failures. This issue arose in specific cases where our primary failed and our fallback on secondary failed as well.
Time (PST) | Event |
---|---|
12/07/2023 11:40 AM | Notified of initialization failures |
12/07/2023 2:40 PM | Added additional fallback to another region to fix initialization failures |
We added multiple levels of fallback to mitigate the VM initialization issue.
A total of 15 customer were affected by this. Only a few pipeline executions were failing because of this and hence we are taking a partial outage of 3 hours for Prod1 and 2 hours 38 minutes for Prod2.
All our fallbacks to initialize VMs to use for Harness CI build pipelines were failing. We added additional fallbacks to mitigate the VM initialization issue and worked on fixing the issues causing a higher failure rate in the current fallbacks