Pipeline Services are having degraded performance

Incident Report for Harness

Postmortem

Summary

After the Redis isolation Maintenance on Prod1, internal monitoring tools showed the pipelines were running slower.

What was the issue?

Harness platform uses a set of services including producers and consumers for the redis streams. The order in which these services were brought up caused some of the streams to not be consumed.

Timeline

Time	Event
9:55AM PT	Noticed intermittent slowness in Pipelines
10:00AM PT	Core services were rolled out again
10:10AM PT	Pipeline performance improved and services were running well

Resolution

Restarting the services in the correct order made the redis producers/consumers available. The pipeline performance also improved and returned to normal latency.

Posted Sep 04, 2024 - 16:13 PDT

Resolved

We can confirm normal operation. Get Ship Done!
We will continue to monitor and ensure stability.

Posted Jul 20, 2024 - 11:43 PDT

Monitoring

Harness service issues have been addressed and normal operations have been resumed. We are monitoring the service to ensure normal performance continues.

Posted Jul 20, 2024 - 10:10 PDT

Identified

We have identified a potential cause of the service issues and are working hard to address it. Please continue to monitor this page for updates.

Posted Jul 20, 2024 - 09:40 PDT

This incident affected: Prod 1 (Continuous Delivery - Next Generation (CDNG), Security Testing Orchestration (STO)).