Gitops agent using mtls is failing to connect to the gitops service

Incident Report for Harness

Postmortem

Summary:

All GitOps agents configured to use mTLS authentication were disconnected.

Ticket: #83615

What was the issue?

The disconnection was caused by a misconfiguration in the gateway component, introduced during a recent configuration update. This resulted in traffic being routed to a non-existent endpoint, blocking communication with the GitOps service.

The issue was not identified in lower environments because of the absence of automated tests for mTLS-based scenarios.

Timeline

Time (UTC) Event
Wednesday, 29th April, 05:00 PM UTC Started the incident
Wednesday, 29th April, 05:30 PM UTC Issue was identified
Wednesday, 29th April, 06:30 PM UTC Fix was validated in QA
Wednesday, 29th April, 06:45 PM UTC Fix was released in the Prod environments.
Wednesday, 29th April, 07:00 PM UTC The system is operational again, and agents started connecting again

Resolution

Fixed the incorrect config.

Next Steps

  1. Expand our release testing to include mTLS-authenticated agents to ensure better coverage and early detection of similar issues.
  2. Enhance monitoring and alerting based on agent connectivity patterns, particularly for mTLS-based agents, to enable faster response and resolution.
Posted May 13, 2025 - 00:07 PDT

Resolved

We have deployed a fix for this issue, and the GitOps service is working correctly again.
Posted Apr 29, 2025 - 12:00 PDT

Identified

We noticed gitops agent using mtls is failing to connect to gitops service in Prod-1 environment. This issue has been identified, we are working on the resolution. Thanks for your patience.
Posted Apr 29, 2025 - 10:00 PDT