Issue with sending Git Status for PR
Incident Report for Harness
Postmortem

Summary

CI pipelines using git connectors with a Delegate as the mode of connection did not update the status back to SCM/Git providers. This did not impact those pipelines whose git connectors are connected via the Harness Platform.

Mitigation

Steps to resolve the issue immediately

  1. Rolled back delegate for rings to which customers belong.
  2. Rolled back delegate for all rings.

Detailed Timeline(PST

Time Event
10/05/2023  
3:30 AM Delegate deployment
10:53 AM First customer ticket for status check not reporting back
11-12 AM Two more customer ticket for the same issue
12:20 PM Incident channel created
2:20 PM Reproduced the issue PR checks reporting not working via delegate
2:40 PM Rolled back delegate in rings
3:30 PM Informed customer about rollback
3:40PM Customer confirmed restoration

RCA

Why didn't certain CI pipelines update SCM/Git status?

  • The delegate task created to update the status on Git failed causing the status to be not reflected on Git provider.

Why did the delegate task fail?

  • A change made to support the new Harness Code module introduced a missing dependency that failed during run-time.

Why didn't it impact all the CI pipelines?

  • Harness provides two ways to connect to Git providers - via Harness Platform or via Harness Delegate. Only the CI pipelines using git connectors with a Delegate as the mode of connection did not update the status back to SCM/Git providers. This did not impact those pipelines whose git connectors are connected via the Harness Platform. This is because the missing dependency issue was in the Delegate.

Why was the missing dependency not predicted in the testing phase?

  • We have automated tests that check running pipelines via Harness Delegate but the tests to check Git status updates in this mode of connectivity were missing.

Steps taken

Harness CI engineers tried to reproduce in-house with various combinations of infrastructure such as Kubernetes, Harness Cloud,  Virtual Machines, etc but it took some time to realize this happens only when the git connector is set up to be connected via Harness Delegate instead of Harness Platform. As soon as we realized this, we engaged the delegate engineering team and they helped with reverting the delegate version to a previous one that did not have this code.

Follow-up actions

Add automation to catch this case and also set up internal alerts when the issue happens so things can be handled proactively.

Posted Oct 06, 2023 - 16:27 PDT

Resolved
This incident has been resolved.
Posted Oct 04, 2023 - 15:18 PDT
Update
The incident has been resolved. Please reach out to Harness Support if you continue to see this behavior.
Posted Oct 04, 2023 - 15:18 PDT
Identified
We have identified the issue and are currently rolling back to a previous version of the delegate.
Posted Oct 04, 2023 - 14:51 PDT
Update
Impact: CI pipelines running on self-hosted infra, and using the latest delegate (v23.09.80804) that use Git Connectors with '''executeOnDelegate: true''''' will fail to see Git status updates.
Workaround: Use '''executeOnDelegate: false''' (Connect via Harness Platform).
Posted Oct 04, 2023 - 14:25 PDT
Investigating
The team has been notified of an issue and is currently investigating.
Posted Oct 04, 2023 - 12:20 PDT
This incident affected: Prod 2 (Continuous Integration Enterprise(CIE) - Self Hosted Runners) and Prod 1 (Continuous Integration Enterprise(CIE) - Self Hosted Runners).