Degraded Performance — Feature Flags in PROD2

Incident Report for Harness

Postmortem

Summary

On April 24, 2026, a large non-batched bulk DELETE operation on the prod-2 primary database triggered lock contention, causing Feature Flag API latency and hung queries across multiple customer SDKs.

Impact

  1. Slow SDK auth/init — SDKs took longer than expected to complete evaluations
  2. Elevated latency across many FF APIs
  3. Limited to Feature Flag module, prod-2
  4. No Data loss

Root Cause

A background cleanup job executed a non-batched, single-transaction delete causing lock contention and API latency spikes

Mitigation

Immediately terminated the offending queries.

Next Steps / Action Items

To prevent such issues from happening again. we are working on

  1. Enhanced alerting and observability on long running queries.
  2. Permanently replace large single-transaction delete pattern with smaller batched deletes
Posted Apr 30, 2026 - 12:24 PDT

Resolved

This incident has been resolved.
Posted Apr 24, 2026 - 11:59 PDT

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Apr 24, 2026 - 11:58 PDT

Investigating

We are currently investigating this issue.
Posted Apr 24, 2026 - 09:06 PDT
This incident affected: Prod 2 (Feature Flags (FF)).