Harness cloud builds failing at initialise step for MAC users
Incident Report for Harness
Postmortem

Summary

CI-hosted MacOS pipelines were failing during the initialisation step, impacting specific customers using our MacOS-hosted service.

What was the issue?

We tightened a firewall rule for our Mac VM registry that was previously too permissive. As a result, another component couldn’t access the registry, leading to pipeline failures.

Resolution

Time Event
Sept 1st, 17:00 UTC Restricted the firewall rule.
Sept 04, 06:03 UTC Issue reported by the customer.
Sept 04, 08:39 UTC We re-created the firewall rule and validated that the issue was fixed.

RCA

Our MacOS production setup includes several components. When we restricted the permissive firewall rule, the new rule did not account for the NAT IP address of one of these components. After the change, we ran a full sanity pipeline on the Mac machines, which passed successfully. The issue didn’t surface immediately as the affected component maintains a persistent socket connection, unaffected by the firewall until the connection is re-established or restarted. This explains why the failure didn’t occur immediately after we removed the permissive rule on September 1st. We restored the rule, and the issue was resolved.

Action Items

  1. Restrict the firewall rule again, ensuring that necessary NAT IPs are included.
  2. Restart all relevant services when applying firewall rule restrictions.
  3. Ensure that all connections are properly drained and re-established when the change is implemented.
Posted Sep 17, 2024 - 03:42 PDT

Resolved
We apologise for the inconvenience caused by this outage. We will make sure to provide the root cause analysis soon.
Posted Sep 03, 2024 - 23:47 PDT
Monitoring
The issue is resolved now. We will be sharing RCA for the problem as soon as possible.
Posted Sep 03, 2024 - 23:39 PDT
Investigating
We are currently investigating this issue.
Posted Sep 03, 2024 - 23:33 PDT
This incident affected: Prod 3 (Continuous Integration Enterprise(CIE) - Mac Cloud Builds), Prod 1 (Continuous Integration Enterprise(CIE) - Mac Cloud Builds), and Prod 2 (Continuous Integration Enterprise(CIE) - Mac Cloud Builds).