Pipeline Services are having degraded performance
Incident Report for Harness
Postmortem

Summary

After the Redis isolation Maintenance on Prod1, internal monitoring tools showed the pipelines were running slower.

What was the issue?

Harness platform uses a set of services including producers and consumers for the redis streams. The order in which these services were brought up caused some of the streams to not be consumed.

Timeline

Time Event
9:55AM PT Noticed intermittent slowness in Pipelines
10:00AM PT Core services were rolled out again
10:10AM PT Pipeline performance improved and services were running well

Resolution

Restarting the services in the correct order made the redis producers/consumers available.  The pipeline performance also improved and returned to normal latency.

Posted Sep 04, 2024 - 16:13 PDT

Resolved
We can confirm normal operation. Get Ship Done!
We will continue to monitor and ensure stability.
Posted Jul 20, 2024 - 11:43 PDT
Monitoring
Harness service issues have been addressed and normal operations have been resumed. We are monitoring the service to ensure normal performance continues.
Posted Jul 20, 2024 - 10:10 PDT
Identified
We have identified a potential cause of the service issues and are working hard to address it. Please continue to monitor this page for updates.
Posted Jul 20, 2024 - 09:40 PDT
This incident affected: Prod 1 (Continuous Delivery - Next Generation (CDNG), Security Testing Orchestration (STO)).