Hosted CI mac builds are impacted

Incident Report for Harness

Postmortem

Summary

On February 24, 2026, Harness experienced an incident that temporarily affected Mac Cloud build scheduling across production environments. During the incident window, new Mac build jobs could not be scheduled due to an orchestration control-plane disruption. The issue was detected immediately through monitoring alerts, investigated by the on-call engineering team, and resolved after restoring the affected control-plane node. Full service was restored shortly thereafter.

Root Cause

The incident occurred due to a loss of quorum in the orchestration control plane responsible for scheduling Mac builds. This was due to two separate issues of memory pressure and a traffic spike at the same time.This resulted in a temporary interruption to Mac build scheduling until the degraded node was restored and quorum was re-established.

Impact

  • Affected Service: Mac Cloud builds
  • Affected Regions: Production clusters (Prod1, Prod2, Prod3, Prod4)
  • Customer Impact:

    • New Mac build jobs could not be scheduled during the incident window
    • Existing running builds were not impacted
  • Services Not Impacted:

    • Linux Cloud builds
    • Windows Cloud builds
    • Self-hosted build infrastructure
    • Other Harness CI/CD services (pipelines, artifacts, deployments)

    Mitigation

Engineering teams restored the degraded orchestration node, allowing the cluster to re-establish quorum and elect a leader.

Once the leader election was completed, Mac build scheduling resumed and services returned to normal operation.

Prevention and Improvements

To reduce the likelihood of similar incidents in the future, the following actions are being implemented:

  • Reducing scheduling load on the orchestration layer by optimizing infrastructure in terms of reliability.
  • Improving monitoring and health checks for control-plane nodes.
Posted Mar 03, 2026 - 17:11 PST

Resolved

This incident has been resolved.
Posted Feb 24, 2026 - 09:11 PST

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Feb 24, 2026 - 09:01 PST

Identified

The issue has been identified and a fix is being implemented.
Posted Feb 24, 2026 - 08:46 PST

Investigating

We are currently investigating this issue.
Posted Feb 24, 2026 - 08:37 PST
This incident affected: Prod 4 (Continuous Integration Enterprise(CIE) - Mac Cloud Builds), Prod 3 (Continuous Integration Enterprise(CIE) - Mac Cloud Builds), Prod 1 (Continuous Integration Enterprise(CIE) - Mac Cloud Builds), and Prod 2 (Continuous Integration Enterprise(CIE) - Mac Cloud Builds).