On March 25, 2025, for 2 hours and 22 minutes, customers in the prod-1 production environment observed stale data on the following custom dashboards: pipeline executions, stage executions, and step executions.
The metadata state tables managing the ETL process were corrupted during a version upgrade, requiring fixes to this table. No data was lost during this process.
The metadata state was reset to trigger data mart updates.
Time(UTC) | Event |
---|---|
26 Mar 2:04 AM | We identified the ETL process that timed out after the upgrade. |
26 Mar 3:18 PM | Redeployed the ETL process, applied the plan, and recreated the views. |
26 Mar 4:22 AM | The metadata schema was rebuilt, and all data quality checks were confirmed to be passing. |
26 Mar 4:25 AM | The incident was resolved. |
Plan application errors were due to an upgrade of ETL process timing out after running for two hours. This resulted in metadata corruption, requiring data fixing. While no data loss was experienced, data staleness was observed because the data marts were not updated with the latest ETL intervals during the metadata recreation.