Pipeline failures due to secret decryption in Prod2

Incident Report for Harness

Postmortem

Summary:

Pipelines experienced failure in resolving secrets in cases where more than one secret were used in custom secret manager. This issue was isolated to secrets associated with custom secret managers.

Root Cause Analysis:

The pipeline failure happened because the system failed to resolve secrets correctly. A code change to improve performance of the secret decryptions was deployed which resulted in failures for secrets stored in custom secret manager. The code change was behind a feature flag. The feature flag was disabled which restored normal pipeline operations.

Action Items:

  1. Add New Test Cases: Add new test cases to the automation suite to cover different configuration combinations for custom secret managers.
  2. Add Metrics and Alerts: Implement appropriate metrics and alerts to detect secret/expression resolution failures proactively and mitigate them.
Posted Mar 24, 2025 - 10:09 PDT

Resolved

This incident has been resolved.
Posted Mar 19, 2025 - 14:55 PDT

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Mar 19, 2025 - 13:20 PDT

Update

We are continuing to work on a fix for this issue.
Posted Mar 19, 2025 - 13:03 PDT

Identified

The issue has been identified and a fix is being implemented.
Posted Mar 19, 2025 - 13:01 PDT

Investigating

We are currently investigating this issue.
Posted Mar 19, 2025 - 12:43 PDT
This incident affected: Prod 2 (Continuous Delivery - Next Generation (CDNG)) and Prod 1 (Continuous Delivery - Next Generation (CDNG)).