-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Epic: CCPM v0.1 Hardening
Overview
Implement reliability and safety improvements to Claude Code PM through four core infrastructure enhancements: YAML frontmatter validation, file-based locking for parallel operations, rate-limit-aware GitHub API access, and hook-based logging. All improvements maintain existing UX and command signatures while preventing common failure scenarios.
Architecture Decisions
1. Shell-Based Implementation
- Decision: Use POSIX shell scripts for maximum compatibility
- Rationale: Existing system is shell-based; maintains consistency and minimal dependencies
- Pattern: Place all scripts in
.claude/scripts/pm/directory
2. File-Based Locking with TTL
- Decision: Simple filesystem locks with automatic expiration
- Rationale: Avoids complex locking mechanisms; self-healing through TTL
- Implementation: Lock files in
.claude/locks/with timestamp-based expiration
3. Non-Invasive Integration
- Decision: Add validation calls to existing commands without changing signatures
- Rationale: Maintains backward compatibility and existing workflows
- Pattern: Prepend validation calls to existing command implementations
4. Hook-Based Logging
- Decision: Use Claude Code's hook system for transparent logging
- Rationale: Zero impact on command behavior; pure observability enhancement
- Format: NDJSON for structured log analysis
Technical Approach
Core Components
1. Frontmatter Validation (validate-frontmatter.sh)
- Purpose: Validate YAML frontmatter and internal links before GitHub sync
- Input: Directory path for recursive validation
- Validation Rules:
- Presence of YAML frontmatter block
- Required
titlefield exists - Files referenced in
depends_onexist within same epic
- Exit Codes: 0 (OK), 2 (FAILED)
2. File Locking System (lock)
- Purpose: Prevent concurrent access to same resources
- Interface:
lock acquire|release <name> [ttl_s] - Default TTL: 1800 seconds (30 minutes)
- Lock Storage:
.claude/locks/<name>.lockwith timestamp - Cleanup: Automatic expiration via TTL check
3. Rate-Limit-Aware GitHub Wrapper (gh_safe)
- Purpose: Prevent GitHub API rate limit failures
- Interface: Drop-in replacement for
ghcommand - Behavior: Check rate limit before each call; sleep until reset if needed
- Integration: Replace bulk GitHub operations in sync commands
4. Hook-Based Logging (pre_tool_use.sh)
- Purpose: Log all tool usage for observability
- Hook Type: PreToolUse in Claude Code settings
- Output:
.claude/logs/hooks.ndjsonwith structured events - Fields: timestamp, tool_name, command, working_directory
Infrastructure Components
Lock Directory Structure
.claude/locks/
├── issue-1234.lock # Issue-specific locks
├── epic-sync.lock # Epic synchronization lock
└── bulk-github.lock # GitHub API bulk operations lock
Log Directory Structure
.claude/logs/
├── hooks.ndjson # Tool usage events
└── gh_safe.log # Rate limit wait events (optional)
Implementation Strategy
Development Phases
Phase 1: Core Scripts (2-3 hours)
- Implement
validate-frontmatter.shwith comprehensive validation - Implement
lockscript with acquire/release/TTL functionality - Implement
gh_safewrapper with rate limit checking - Create
pre_tool_use.shhook for logging
Phase 2: Integration (1-2 hours)
- Add validation calls to
/pm:epic-sync,/pm:sync,/pm:validate - Wrap
/pm:issue-startwith locking mechanism - Replace
ghcalls withgh_safein bulk operations - Configure hook in
.claude/settings.json
Phase 3: Testing & Validation (1 hour)
- Test positive/negative validation scenarios
- Test concurrent lock acquisition
- Test rate limit handling with forced low limits
- Verify NDJSON log generation
Risk Mitigation
Script Permissions: Document chmod +x requirements clearly
Backward Compatibility: All existing commands maintain identical interfaces
Rollback Strategy: Simple removal of prolog lines and hook configuration
Lock Deadlocks: TTL-based automatic cleanup prevents permanent locks
Testing Approach
Unit Testing:
- Individual script testing with edge cases
- Mock GitHub API responses for rate limit testing
- Filesystem permission testing
Integration Testing:
- End-to-end command flows with validation
- Concurrent session testing for locking
- Bulk operation testing with rate limits
Task Breakdown Preview
High-level task categories that will be created:
- Validation Infrastructure: Implement
validate-frontmatter.shwith YAML parsing and link checking - Locking System: Implement file-based locking with TTL and cleanup mechanisms
- GitHub Rate Limiting: Implement
gh_safewrapper with intelligent wait logic - Logging Infrastructure: Implement hook-based logging with NDJSON output
- Command Integration: Wire validation and locking into existing PM commands
- Configuration Setup: Update settings and ensure proper permissions
- Testing & Documentation: Comprehensive testing and usage documentation
Dependencies
External Dependencies
- GitHub CLI: Already present in system
- POSIX Shell: Standard Unix environment
- Claude Code Hook System: For logging integration
Internal Dependencies
- Existing PM Commands: Integration points for validation and locking
- Epic Directory Structure: For frontmatter validation scope
- Settings Configuration: For hook system integration
Prerequisite Work
- None - builds on existing CCPM infrastructure
Success Criteria (Technical)
Functionality
/pm:validatefails appropriately on invalid frontmatter or broken links- Concurrent
/pm:issue-startoperations are safely serialized - GitHub API operations never fail due to rate limits
- All tool usage is logged without affecting command behavior
Performance
- Validation adds <2 seconds to sync operations
- Lock operations complete in <100ms
- Rate limit checks add minimal overhead to GitHub calls
- Logging has zero impact on command execution time
Reliability
- No false positives in frontmatter validation
- Locks are automatically cleaned up after TTL expiration
- Rate limit detection is accurate and responsive
- Log files rotate or remain bounded in size
Estimated Effort
Overall Timeline
- Total Development: 4-6 hours
- Testing & Integration: 2 hours
- Documentation: 1 hour
- Total: 7-9 hours (can be completed in 1 development session)
Resource Requirements
- Single developer with shell scripting experience
- Access to GitHub API for testing rate limit scenarios
- Local CCPM installation for integration testing
Critical Path Items
validate-frontmatter.shimplementation (foundation for other components)- Integration with existing commands (affects all workflows)
- Permission and configuration setup (required for deployment)
This epic delivers significant reliability improvements while maintaining the existing user experience and can be implemented as a focused, single-session development effort.
Stats
Total tasks: 6
Parallel tasks: 4 (can be worked on simultaneously)
Sequential tasks: 2 (have dependencies)
Estimated total effort: 11 hours