Customers with no target groups configured were being returned null
instead of []
for the /target-segments request when their sdks started up. This could lead to null pointer exceptions and a failure to initialise for some sdks.
Issue was related to a number of SDK versions, so we tested them and the latest versions to ascertain impact.
Java
1.3.1:
waitForInitialzation
call never unblocks once the exception is thrown and caught in the polling thread. This would have caused user code to “freeze” when the SDK is initialising.1.6.0 - latest version:
waitForInitialzation
call unblocks and evaluation calls return the correct variation.Node.js
1.3.1:
UnhandledPromiseRejection
causes the SDK and application to crash. If the SDK client and waitForInitialzation
were used in a try-catch block, then an error would be logged and the SDK would serve the correct evaluations.Impact:
1.8.1 - latest version: same behaviour and impact as 1.3.1
Other SDK impact
The remaining server SDKs have been tested on their latest versions to ascertain impact. While there were no direct customer reports of issues, this is useful to understand the scope of this issue.
In the scenario where a customer had 0 target groups the /client/target-segments endpoint returned the value null
instead of an empty array []
A change was made in the db layer of the backend to return an empty array rather than a not found error when no target groups exist for a customer. This had the impact of hitting a different codepath. This codepath copies all groups into a new array and changes some data before marshalling and returning the json response. Because no groups existed this copy would mistakenly end up returning a nil object instead of an empty array, which then got marshalled into the null
json response.
Because of our high request rates we use many layers of caching. A side effect of returning errors from the db layer when no target groups exist was that we wouldn’t cache that response. With some high volume customers having no target groups this led to tens of millions of unnecessary requests hitting the database per week when flags are evaluated which we were attempting to avoid.
Unit tests, end to end tests and sdk specific tests exist for this endpoint however the case where target groups are empty wasn’t full covered. This change was primarily meant to improve performance for the /client/evaluations endpoint which uses this code path and which was manually tested and confirmed to work correctly. The /client/target-segments code path experiencing side affects from this change wasn’t anticipated or caught by automated testing.
Followup actions can cover the following based on different issues faced along with Jira id’s linked for tracking the followup completion. these followup items must ALSO be linked in the RCA ticket
Test enhancements