Overview: Security Testing Orchestration (STO) and IACM module impacted
What was the issue?
The STO and IaCM modules couldn't complete execution, causing the pipeline execution to time out. The reason was that the Redis keys were rotated, but the two microservices responsible for these modules were still using the older keys.
Timeline:
Time | Event |
---|---|
25th Apr 2024 7:03 AM PDT | Issue was noticed & investigation started. |
25th Apr 2024 7:35 AM PDT | Issue Identified. |
25th Apr 2024 7:43 AM PDT | Issue was resolved for STO. We continued Monitoring. |
25th Apr 2024 7:49 AM PDT | Issue was resolved for IaCM. We continued Monitoring. |
25th Apr 2024 8:00 AM PDT | All modules are declared Operational. |
Resolution:
The STO and IaCM modules were updated to use the new keys.
RCA & Action Items:
Two microservices were missed in the update because they had different configuration formats in QA vs. Production. Our change management process did not account for this discrepancy. As part of the improvement process, we will standardize the configurations across environments and add relevant checks for key rotation in the change management process.