Issue Affecting Feature Flag Updates - PROD2

Incident Report for Harness

Postmortem

Summary

On Feb 3rd, 2026, customers using the Feature Flags module in the production environment (Prod2) observed delays in seeing their updates reflected in the user interface. The issue was caused by lag in a database read replica, which resulted in stale data being served for read operations. The issue was identified, mitigated, and fully resolved. 

Impact

During the incident window:

  • Updates made to Feature Flags Classic in Prod2 were not immediately reflected in the UI.
  • Read operations returned stale data due to replication lag between the primary database and a read replica.
  • All Feature Flags Classic customers in Prod2 were affected.

There was no data loss, and write operations continued to be processed successfully. The impact was limited to delayed visibility of updates and temporary confusion regarding the status of recent changes. Overall service availability was slightly degraded during the incident.

Root Cause

Feature Flags Classic relies on a primary database for write operations and read replicas for read operations. During the incident, a long-running database query caused a read replica to fall significantly behind the primary database.

As a result, while customer updates were successfully written to the primary database, reads served from the lagging replica returned outdated data. Because replication lag alerts were not enabled at the time, the issue was not detected immediately through automated monitoring.

Mitigation

As immediate mitigation steps:

  • The long-running query on the affected read replica was terminated.
  • The replica was restarted, allowing it to catch up with the primary database and resume normal operation.

These actions restored data consistency between the primary and replica databases and resolved the customer-facing impact.

Action Items

To reduce the risk of recurrence and improve detection, the following actions are being implemented:

  • Enable proactive monitoring and alerting for database replication lag with defined thresholds.
  • Configure query timeouts to prevent long-running queries from impacting database replicas.
  • Establish clearer operational guidelines for executing resource-intensive queries.
  • Review and periodically validate database alerting configurations to ensure early detection of similar issues.
Posted Feb 16, 2026 - 10:06 PST

Resolved

This incident has been resolved.
Posted Feb 04, 2026 - 05:28 PST

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Feb 04, 2026 - 04:56 PST

Identified

The issue has been identified and a fix is being implemented.
Posted Feb 04, 2026 - 04:54 PST

Investigating

We are currently investigating this issue.
Posted Feb 04, 2026 - 04:53 PST
This incident affected: Prod 2 (Feature Flags (FF)).