Mac Hosted CI builds observing failures

Incident Report for Harness

Postmortem

Summary

An internal monitoring script overload caused our build scheduler to run out of memory, briefly failing few Mac pipelines; the script’s been removed and safeguards are in progress.

Root Cause

A watcher script deployed to Mac clients ran commands every few minutes that flooded the server’s memory and caused it to go OOM. This lead to pipeline execution failures for Mac Hosted builds. The script was removed, deployment pipelines were fixed, and a log-based alternative plus additional safeguards/alerts are being implemented.

Timeline

Timeline Event
2025-07-03 18:37 UTC FH started to investigate the internal alerts and customer reported failures
2025-07-03 18:53:50 UTC Issue identified to be related to a monitoring script
2025-07-03 19:49:30 UTC Issue mitigated and monitoring script stopped

Resolution

We immediately removed the watcher script issuing the heavy client calls, which stopped the memory surge and stabilized the server. Following the scheduler restart, the Mac Pipelines recovered and system was functional.

Action Items

  • Implement alternate monitoring mechanism using logging
  • Update runbook for the monitoring changes
Posted Jul 22, 2025 - 04:47 PDT

Resolved

The issue is resolved.
Posted Jul 03, 2025 - 12:56 PDT

Monitoring

We have applied the fix. We are monitoring pipelines for failures.
Posted Jul 03, 2025 - 12:50 PDT

Identified

We have identified the issue and fixing the problem
Posted Jul 03, 2025 - 12:50 PDT

Investigating

We are currently investigating this issue.
Posted Jul 03, 2025 - 12:37 PDT
This incident affected: Prod 4 (Continuous Integration Enterprise(CIE) - Mac Cloud Builds), Prod Eu1 (Continuous Integration Enterprise(CIE) - Self Hosted Runners), Prod 3 (Continuous Integration Enterprise(CIE) - Mac Cloud Builds), and Prod 2 (Continuous Integration Enterprise(CIE) - Mac Cloud Builds).