Summary
- Between Dec 28, 2025 00:04 UTC and Jan 8, 2026 11:48 UTC, impressions integration data experienced delays of varying degrees.
- Amazon S3 integrations were impacted from Dec 28 through Jan 8, with delays reaching up to 36 hours at peak.
- Amplitude, Segment, and custom webhook integrations were impacted from Jan 2 through Jan 7, with delays reaching up to 14 hours at peak.
- A small number of customers experienced data loss due to rate limiting at their destination during recovery; the vast majority of customers received all their impressions data.
Root Cause
Significant increases in impressions volume caused our integration pipelines to reach their maximum throughput capacity. The S3 integration encountered volume growth that exceeded its processing capacity, while the Amplitude, Segment, and webhook integrations faced similar throughput constraints as traffic continued to increase.
Impact
- Outbound impressions data to S3, Amplitude, Segment, and custom webhook destinations was delayed.
- Customers using these integrations would have seen data arrive later than expected.
What was not impacted?
- SDK feature flag evaluations and targeting
- FME flag delivery network
- Events integrations
- Admin API and UI access
- Customer flag configuration data
Remediation
For S3 integrations, we reordered and regrouped jobs to prioritize larger integrations, allowing them more time to complete. For Amplitude, Segment, and webhook integrations, we increased throughput through configuration changes within the data pipeline.
Action Items
- Rebuild webhook integration architecture: We are implementing a new architecture for Amplitude, Segment, and webhook integrations that provides better isolation from noisy neighbors and higher maximum throughput.
- Improve S3 batch processing: We are separating batch workloads to prevent a single slow job from delaying others, with prioritization now in place for larger jobs.
- Enhanced monitoring and alerting: New alerts have been deployed for both systems to ensure engineering teams engage with delays earlier, enabling faster recovery.