On May 27, 2026, customers using Harness Pipelines on the legacy log-storage backend experienced intermittent failures and missing pipeline logs in the UI. No log data was permanently lost
The root cause was CPU saturation of the log service's underlying cache infrastructure, triggered by an automation script from one enterprise customer that opened a large number of long-lived log-streaming connections without closing them.
Immediate actions taken during the incident:
The following actions are in progress or planned to prevent recurrence:
Complete migration to new log storage infra : The primary long-term fix is finishing the migration of all remaining accounts
Enhance monitoring alerts: A broader audit of all monitoring alert rules is underway to identify any other critical detection paths that may be silently disabled.
Extend streaming connection timeout reduction globally: The 10-minute maximum streaming connection lifetime is being rolled out to all remaining environments (including development and free-tier clusters) to ensure consistent protection.
Per-account rate limiting on streaming connections: We are adding an application-level cap on concurrent streaming connections per account. The initial implementation will be observe-only (logging warnings when thresholds are approached) to gather data on real-world usage before converting to hard enforcement.
Improved error messaging: When log operations fail due to underlying infrastructure issues (cache timeouts, I/O errors), the error surfaces to users and logs will be updated to accurately reflect the infrastructure cause, rather than showing a generic "stream not found" message that obscures the true reason for failure.