Some Feature Flag customers are experiencing intermittent issues with evaluating target groups on Prod2

Incident Report for Harness

Postmortem

What was the issue?

After a recent change to the Feature Flag authentication gateway, some evaluations failed for TargetGroups with rules that use a custom attribute.

Once identified, the team reverted the configuration change, and evaluations returned to normal

Time (UTC) Event
09:02 Feature Flag authentication gateway configuration change applied
16:49 First report of issues relating to evaluations of custom rules on TargetGroups
18:34 Feature Flag authentication gateway configuration reverted back
18:39 Evaluations of targetGroups returns to normal

RCA

As part of improvements to our disaster recovery strategy, a change was made to make the Feature Flag authentication gateway more robust.

Initial testing preformed failed to account for the scenario of client-side SDKs, with target groups using rules that use a custom attribute (rather than a core attribute like identifier).

Client SDKs generally make two types of request.

  1. An auth request
  2. An evaluation request to get flag values.

During the auth flow, the provided target and its attributes is stored in a DB and Redis e.g.

{
.  identifer : "123-456-789",
.  name : "bob",
.  custom_attribute_1 : "value1"
}

During the evaluation flow, the target is retrieved from the cache if still present, and if not, it will be retrieved from the DB, and stored in the cache.

After the Feature Flag authentication gateway change, targets were being written to a different Redis. They would still be persisted to the DB, but if the Redis instance used during evaluations contained an older version of the target that did not have the attributes, then the code would never go to the DB i.e. the Redis may contain

{
.  identifer : "123-456-789",
.  name : "bob"
}

In this case, if the custom rule used identifier or name, it would work as expected, but if it is using the custom attribute, then that would be missing during the evaluation.

Actions Items

  1. Update our test suite, to include additional user authentication flows, to account for the impacted use case
  2. Update the Feature Flag authentication gateway to use distinct Redis instances for each environment
  3. Review functionality, that supports additional reading of target attributes, to provide a failsafe to ensure the correct evaluation is returned
Posted Mar 06, 2025 - 03:52 PST

Resolved

Issue has now been resolved, and will share the RCA shortly
Posted Mar 03, 2025 - 11:07 PST

Update

We are continuing to monitor for any further issues.
Posted Mar 03, 2025 - 11:05 PST

Monitoring

Issue has been identified, and a fix has been put in place.

Team are continuing to monitor the issue
Posted Mar 03, 2025 - 10:52 PST

Investigating

We are currently getting reports of some customers experiencing intermittent issues iwht Feature Flags when evaluating target groups in the prod2 environment.

Team are actively diagnosing the issue, and will keep you updated
Posted Mar 03, 2025 - 10:48 PST
This incident affected: Prod 2 (Feature Flags (FF)).