Skip to content

Conversation

@JesperTerkelsen
Copy link
Member

Summary

FIX: Detect and automatically handle ArgoCD ComparisonError with hard refresh to resolve cache corruption.

Problem

ArgoCD deployments fail with Unknown sync status when the repo cache is corrupted, causing ComparisonError.

Error Example: https://github.com/monta-app/service-grid-data/actions/runs/21033316677/job/60475088418

Error Message:

CONDITION        MESSAGE
ComparisonError  Failed to load target state: failed to generate manifest for source 1 of 1: 
                 rpc error: code = Unknown desc = failed to walk for symlinks in <path to cached source> 
                 lstat <path to cached source>/apps/data-sharing-event-processor/staging/infra/tmpcharts/elasticache-0.2.2.tgz: 
                 no such file or directory

Sync Status:     Unknown

Root Cause: ArgoCD's repo cache becomes corrupted and references files from wrong applications, causing manifest generation to fail. This makes the sync status Unknown and deployments fail after 30s.

Solution

This PR improves error handling to automatically recover from ComparisonError:

  1. Detect ComparisonError from status.conditions[] array
  2. Trigger hard refresh at 5s and 10s to clear corrupted cache
  3. Extended timeout to 60s for ComparisonError (vs 30s for generic Unknown)
  4. Clear error message if ComparisonError persists after recovery attempts

Changes

wait-sync.sh

New ComparisonError handling:

# Check for ComparisonError condition
COMPARISON_ERROR=$(echo "$APP_INFO" | jq -r '.status.conditions[] | select(.type == "ComparisonError") | .message // ""' 2>/dev/null || echo "")

if [ -n "$COMPARISON_ERROR" ]; then
    echo "::warning::ArgoCD ComparisonError detected: $COMPARISON_ERROR"

    # Attempt hard refresh to clear corrupted cache
    if [ "$ELAPSED" -eq 5 ] || [ "$ELAPSED" -eq 10 ]; then
        echo "  Attempting hard refresh to clear ArgoCD cache..."
        argocd app get "$APP_NAME" --hard-refresh "${ARGOCD_FLAGS[@]}" &>/dev/null || true
        echo "  Hard refresh triggered, waiting for ArgoCD to recompute state..."
    fi

    if [ "$ELAPSED" -gt 60 ]; then
        # Give it 60 seconds to resolve ComparisonError with hard refresh
        echo "::error::ArgoCD ComparisonError persists after 60s and hard refresh attempts"
        echo "::error::This indicates a server-side issue with ArgoCD's repository cache"
        echo "::error::ComparisonError: $COMPARISON_ERROR"
        argocd app get "$APP_NAME" "${ARGOCD_FLAGS[@]}" || true
        exit 1
    fi
fi

How Hard Refresh Works

argocd app get --hard-refresh forces ArgoCD to:

  • Invalidate the repo cache
  • Re-clone the repository from scratch
  • Recompute the desired state from manifests
  • Clear any corrupted cache references

This typically resolves cache corruption issues without manual intervention.

Impact

Automatic recovery - ComparisonError resolved automatically via hard refresh
Better error messages - Clear indication when ComparisonError is detected
Extended timeout - 60s for ComparisonError gives time for recovery
No breaking changes - Existing behavior preserved for other Unknown states

Testing

After merge, deployments encountering ComparisonError will:

  1. Detect the issue immediately
  2. Attempt automatic recovery with hard refresh
  3. Continue monitoring for up to 60s
  4. Fail with clear error if issue persists

🤖 Generated with Claude Code

When ArgoCD has a ComparisonError (e.g., corrupted repo cache referencing
wrong files), the sync status becomes Unknown and deployments fail.

This change:
- Detects ComparisonError conditions from status.conditions
- Attempts hard refresh at 5s and 10s to clear corrupted cache
- Extends timeout to 60s for ComparisonError (vs 30s for generic Unknown)
- Provides clear error message if ComparisonError persists

Hard refresh forces ArgoCD to re-clone the repo and recompute the desired
state, which typically resolves cache corruption issues.
@JesperTerkelsen JesperTerkelsen merged commit 3bdc5e5 into main Jan 15, 2026
1 check passed
@JesperTerkelsen JesperTerkelsen deleted the fix/handle-argocd-comparison-error branch January 15, 2026 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants