Custom Dashboards [Unified View Explores] are experiencing delays in updating in Prod2

Incident Report for Harness

Postmortem

Summary

For 26 hours, customers on Prod-2 observed stale data on the following custom dashboards: pipeline executions, stage executions, and step executions. The metadata state tables managing the ETL process were corrupted during a plan application upgrade, requiring a rebuild of the customer-facing data marts for the dashboards. No data was lost during this process.

Resolution

The metadata state was reset to trigger data mart updates.

RCA

Plan application errors were due to metadata corruption. While no data loss was experienced, data staleness was observed because the data marts were not updated with the latest ETL intervals during the metadata recreation. 

Action Items

  • The ETL framework will be updated more frequently. Harness will set a regular cadence for testing new updates and deploying them into production to reduce drift in metadata rollbacks.
  • Metadata tables will be decoupled from raw data storage to better manage state effects. Decoupling state from raw ingestion will allow faster iteration loops if a database rollback is needed.
Posted Apr 23, 2025 - 12:02 PDT

Resolved

Custom Dashboards [Unified View Explores] are now updating normally. The issue is now resolved. We appreciate your patience.
Posted Jan 24, 2025 - 18:07 PST

Update

We are still working hard to address the problem and aim to resolve it by 5 PM PST. We understand the inconvenience this may cause and appreciate your patience.
Posted Jan 24, 2025 - 14:45 PST

Update

We are still working hard to address the problem and aim to resolve it by 3 PM PST. We understand the inconvenience this may cause and appreciate your patience.
Posted Jan 24, 2025 - 12:26 PST

Update

We are still working hard to address the problem and aim to resolve it by 12 PM PST. We understand the inconvenience this may cause and appreciate your patience.
Posted Jan 24, 2025 - 09:57 PST

Identified

We are still working hard to address the problem and aim to resolve it by 10 AM PST. We understand the inconvenience this may cause and appreciate your patience.
Posted Jan 24, 2025 - 08:26 PST

Investigating

We are experiencing an issue where Custom Dashboards [Unified View Explores] are not updating as expected. We have identified the problem and aim to resolve it by 8 AM PST. We understand the inconvenience this may cause and appreciate your patience.
Posted Jan 23, 2025 - 22:32 PST
This incident affected: Prod 2 (Custom Dashboards).