Scheduled change request delayed for a subset of flags

Incident Report for Harness

Postmortem

Summary

Between April 28 and May 1, 2026, a subset of scheduled change requests failed to execute at their scheduled times. The issue was introduced by a service update on April 28 and resolved on May 1. All affected scheduled jobs that had not been manually withdrawn by customers have since been executed successfully.

Root Cause

A service update exposed a latent bug in the path that retrieves applicable governance policies for a change request. This caused scheduled change request execution to fail, leaving affected change requests stuck in their scheduled state with no automatic retry.

Impact

  • Scheduled change request execution failed silently for a subset of flags on affected accounts
  • Affected change requests were not applied at their scheduled time; they remained visible as scheduled in the UI
  • SDKs and runtime flag evaluation were not impacted
  • No data loss occurred; all backlogged jobs have since been processed

Timeline

Time (UTC) Event
Apr 28, 18:21 Production deployment introduced the regression
Apr 28, 20:31 First scheduled change request execution fails
May 1, 08:35 First customer report received
May 1, 09:43 Escalated to engineering
May 1, 19:43 Fix deployed to production
May 1, 22:28 All stuck scheduled jobs confirmed published

Remediation

  • Deployed a forward fix addressing the underlying latent bug and adding additional protections
  • Reset scheduler tasks, allowing the scheduler to re-execute them; all have since been confirmed published

Action Items

  • Enhance test coverage for scheduled change request execution edge cases
  • Add alerting for scheduled changes that have not executed within a threshold of their scheduled time, enabling proactive detection ahead of customer reports
Posted May 05, 2026 - 21:02 PDT

Resolved

Between April 28 and May 1, 2026, a subset of scheduled change requests failed to execute at their scheduled times. The issue was introduced by a production deployment on April 28 and fixed with a new deployment on May 1. All affected scheduled jobs that had not been manually withdrawn by customers have since been executed successfully.
Posted Apr 28, 2026 - 10:21 PDT