CI Plugin Image Retrieval Failure from the Docker Hub

Incident Report for Harness

Postmortem

Summary

Certain CI pipelines utilizing Harness CI steps, such as PluginStep and Setup Build Intelligence, encountered the error failed to get image entrypoint.
CI Build Intelligence is enabled by default to enhance build caching. This improvement introduces a background step in each CI stage that operates a cache proxy server, which fetches images from Docker Hub.

What was the issue?

A recent outage at Docker Hub, as reported on the Docker Systems Status Page, caused the "Setup Build Intelligence" and CI PluginStep in the CI stage to fail. This issue was due to an inability to retrieve the image entrypoint, as indicated. According to Docker Hub, the outage was limited to unauthenticated clients (Anonymous).

Timeline

Timestamp Event Action
6th Feb 8:48 AM UTC A customer reported that the buildIntelligence step is failing Initiated a SWAT call to address the issue.
6th Feb 9:00 AM UTC Issue identified as stemming from Docker Hub downtime Docker Systems Status Page
6th Feb 9:20 AM UTC Docker Hub issue resolved and pipeline failures stopped

Action Items

  1. To mitigate such issues, Harness recommends that customers configure the built-in Harness Image Docker connector to use credentials instead of anonymous access and to pull images from GCR or ECR rather than Docker Hub. For detailed instructions, please refer to Configure Harness to always use credentials to pull Harness images.
  2. We are actively working to eliminate dependencies on external systems to enhance our reliability even further.
Posted Feb 06, 2025 - 20:20 PST

Resolved

This incident has been resolved.
Posted Feb 06, 2025 - 01:29 PST

Monitoring

Dockerhub incident has been resolved, and we are continuing to monitor on our side
Posted Feb 06, 2025 - 01:26 PST

Identified

We are seeing an increase in failed pipelines that use the Build Intelligence, as it pulls from dockerhub.
Posted Feb 06, 2025 - 01:24 PST

Investigating

DockerHub is facing an incident and in degraded performance https://www.dockerstatus.com/
Posted Feb 06, 2025 - 01:00 PST
This incident affected: Prod 3 (Continuous Integration Enterprise(CIE) - Self Hosted Runners, Continuous Integration Enterprise(CIE) - Windows Cloud Builds, Continuous Integration Enterprise(CIE) - Linux Cloud Builds), Prod 2 (Continuous Integration Enterprise(CIE) - Self Hosted Runners, Continuous Integration Enterprise(CIE) - Windows Cloud Builds, Continuous Integration Enterprise(CIE) - Linux Cloud Builds), and Prod 1 (Continuous Integration Enterprise(CIE) - Self Hosted Runners, Continuous Integration Enterprise(CIE) - Windows Cloud Builds, Continuous Integration Enterprise(CIE) - Linux Cloud Builds).