Summary
On June 6th, our SCM service experienced degraded performance that briefly impacted infrastructure pipeline operations. This was triggered by a high-volume onboarding of modules using SSH-based connectors via our Terraform provider, which caused an unexpected resource spike in core services. This effected all services that use the SCM service including CI, CD & STO.
Root Cause
The primary cause was a surge in SSH metadata validation commands executed concurrently by the one of the services. This overwhelmed the system’s ability to process requests, leading to service disruption.
Mitigation
Preventive and Long-Term Improvements