Some accounts facing degraded functionalities : PROD1

Incident Report for Harness

Postmortem

Impact

Between 05:45 AM and 09:58 AM UTC on Thursday, 24 July 2025, customers using the DevOps Essentials tier in Production 1 encountered “Failed to fetch (500)” errors when attempting to start build pipelines in the UI. All other product areas and tiers remained fully functional.

Root Cause

A recently introduced configuration change created a mismatch between the DevOps Essentials entitlement state and one of the platform’s entitlement-validation endpoints. When that endpoint processed DevOps Essentials requests, it returned an unhandled error path (HTTP 500). Because the UI requires a successful response from this endpoint before running pipelines, build execution was blocked for affected customers.

Timeline (UTC)

Time Event
05:45 AM New version deployed to Production 1
09:06 AM First customer report received; incident opened
09:20 AM Issue identfied - Validation-endpoint mismatch
09:58 AM Rollback to previous stable release completed; functionality restored and incident mitigated.
01:26 PM Corrected build promoted to QA
02:43 PM QA verification complete
03:02 PM Corrected build deployed to production environments

Remediation

  • Immediate: Rolled back to the last known-good release, restoring UI pipeline functionality for DevOps Essentials customers.
  • Permanent: Deployed an updated build with improved entitlement handling logic.

Action Items

  1. Strengthen defensive checks – Introduce guardrails that prevent entitlement mismatches from returning unhandled errors.
  2. Enhance monitoring & alerting – Add targeted health checks to detect similar discrepancies before they affect customers.
  3. Review deployment safeguards – Refine release procedures to ensure configuration changes and validation logic remain in sync.
Posted Jul 30, 2025 - 17:25 PDT

Resolved

This incident has been resolved.
Posted Jul 24, 2025 - 04:01 PDT

Update

We are continuing to monitor for any further issues.
Posted Jul 24, 2025 - 03:43 PDT

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jul 24, 2025 - 02:48 PDT

Identified

Accounts with devops-essentials license type are encountering degraded functionalities
Posted Jul 24, 2025 - 02:36 PDT
This incident affected: Prod 1 (Continuous Delivery (CD) - FirstGen - EOS, Continuous Delivery - Next Generation (CDNG), Continuous Integration Enterprise(CIE) - Self Hosted Runners, Continuous Integration Enterprise(CIE) - Mac Cloud Builds, Continuous Integration Enterprise(CIE) - Windows Cloud Builds, Continuous Integration Enterprise(CIE) - Linux Cloud Builds, Security Testing Orchestration (STO), Infrastructure as Code Management (IaCM), Code Repository).