Few Pipelines are running slow and some failures

Incident Report for Harness

Postmortem

Summary

Due to capacity issues on our primary database, pipelines latency for couple of our customers got degraded and errored out.

Root Cause:
One of the databases was under provisioned for the high workload experienced

Mitigation:
We scaled out our database and made sure there is enough headroom and capacity for the workload.

Next Steps:

Add additional granular monitoring for our databases.
Update thresholds for alerting for warning us on increase in utilization/under capacity.

Posted Sep 08, 2025 - 22:42 PDT

This incident has been resolved.

Posted Aug 27, 2025 - 20:32 PDT

A fix has been implemented and we are monitoring the results.

Posted Aug 27, 2025 - 20:30 PDT

We have identified the issue and mitigated the issue

Posted Aug 27, 2025 - 20:30 PDT

Internal monitoring identified a potential issue with pipeline

Posted Aug 27, 2025 - 20:29 PDT

This incident affected: Prod 2 (Continuous Delivery - Next Generation (CDNG)).