Git API rate limit errors

Incident Report for Harness

Postmortem

Summary

On February 18, 2026, between 7:30 AM and 12:15 PM PST, Some enterprise customers experienced Git API rate limit errors when using remote entities backed by Git repositories on prod1 environment. This prevented loading of services, environments, and other entities stored in Git, and blocked pipeline executions that depended on the remote YAML. Service was fully restored after rolling back the release.

Root Cause

A regression introduced in a recent release caused incorrect cache invalidation in the GitX bidirectional sync mechanism.

  • Webhook event processing cleared the GitX cache on every event instead of only when relevant changes occurred
  • The two enterprise customers generated high webhook volume and contained a large number of remote entities
  • The cache was invalidated continuously, forcing all entity fetches to query GitHub directly
  • This behavior rapidly exhausted GitHub API rate limits

The impact was amplified by repeated calls to the API, which retrieves all service definitions from Git. Due to the high number of remote services, these calls significantly increased API consumption.

Impact

  • Remote entities (pipelines, templates, services, environments) failed to load
  • Pipeline executions depending on remote YAML were blocked

Remediation

  • Immediate: Rolled back the Prod1 system release, which restored normal caching behavior and resolved the rate limit issues.
  • Permanent: A fix has been implemented to correct the cache invalidation logic, ensuring the cache is only cleared whenever necessary rather than on every webhook push/pr event processing.

Action Items

  1. Add proactive alerting for cache health - Implement monitoring and alerts that trigger when cache hit rates drop below expected thresholds, enabling faster detection of cache-related issues
  2. Move to GitHub App–based authentication - Adopt GitHub App authentication to significantly increase API rate limits and reduce the risk of throttling
  3. Improve cache observability - Add comprehensive metrics for all cache operations to enable better monitoring, troubleshooting, and debugging of cache-related issues
  4. Enhance automated testing - Expand test coverage to include cache behavior validation as part of automated sanity and regression testing
  5. Holistic review of GitX flows - Review all GitX flows, identify P0 and P1 paths, and ensure full automation coverage and observability across these critical workflows
Posted Feb 20, 2026 - 11:28 PST

Resolved

Two enterprise customers experienced Git API rate limit errors when using remote entities backed by Git repositories on prod1 environment.This prevented loading of services, environments, and other entities stored in Git, and blocked pipeline executions that depended on the remote YAML.
Posted Feb 18, 2026 - 07:30 PST