Skip to content

implement CoverageTrack with RLE compression for #62#99

Closed
crashfrog wants to merge 2 commits into
mainfrom
worktree-agent-afaa7e47
Closed

implement CoverageTrack with RLE compression for #62#99
crashfrog wants to merge 2 commits into
mainfrom
worktree-agent-afaa7e47

Conversation

@crashfrog
Copy link
Copy Markdown
Member

Summary

Implement run-length encoding (RLE) compression for CoverageTrack in phraya-core. Coverage values are quantized to the nearest 5 before encoding to reduce storage overhead while preserving precision for biological applications.

Key Features

  • RLE Encoding: Compresses runs of identical quantized values as (value, length) pairs
  • Quantization: Coverage values rounded to nearest 5 (7 → 5, 8 → 10, 13 → 15, etc.)
  • Random Access: O(log runs) binary search via coverage_at(pos)
  • Full Reference Length: Stores complete coverage including zero-coverage regions
  • Iteration: Decompress via iterator over (position, coverage) pairs
  • Serialization: Full serde support for .phraya file format

Test Coverage

  • 14 unit tests: uniform coverage (single run), alternating patterns (many runs), zero-coverage regions, quantization boundaries
  • 5 property tests: round-trip encode/decode, quantization idempotence, value consistency
  • Realistic bacterial genome simulation: 4.6 Mbp E. coli with 5-region coverage variation
  • Serialization round-trip validation

Acceptance Criteria Met

  • RLE encoding: compress runs of identical values as (value, length) pairs
  • RLE decoding: decompress to full coverage array for validation
  • Quantization: round coverage to nearest 5 (0, 5, 10, 15, ...)
  • Random access: coverage_at(pos) returns value at position via binary search
  • Construction from Vec (coverage per position)
  • Iterator over (position, coverage) for decompression
  • Property tests: round-trip (encode → decode == original, modulo quantization)
  • Property tests: quantization idempotence (quantize twice == quantize once)
  • Unit tests: uniform coverage, alternating coverage, zero coverage regions
  • Benchmark: compression ratio on realistic bacterial genome (included in benches/)

Test Plan

  • Run cargo test -p phraya-core types to validate all 46 tests pass
  • Run cargo test -p phraya-core coverage_track --lib to focus on 14 CoverageTrack tests
  • Run cargo test -p phraya-core property_ --lib to validate 5 property tests
  • Run cargo clippy -p phraya-core to ensure no linting issues
  • Run cargo fmt -p phraya-core -- --check to verify formatting

Generated with Claude Code

crashfrog and others added 2 commits May 29, 2026 09:23
Comprehensive test suite for CoverageTrack RLE compression with quantization:

Unit tests (14):
- Uniform coverage (single run)
- Alternating coverage (many runs)
- Zero coverage regions
- Quantization to nearest 5 (exact multiples, boundaries, rounding)
- Random access via binary search (O(log n))
- Out-of-bounds access
- Iterator over positions
- Empty coverage edge case
- Single position edge case
- Realistic bacterial genome (4.6Mbp with variation)
- High depth sequencing (100x)
- Serialization round-trip

Property tests (5):
- Round-trip encode/decode equality (modulo quantization)
- Quantization idempotence (quantize twice == quantize once)
- coverage_at(i) matches to_vec()[i]
- All decompressed values are multiples of 5
- Quantized values within ±2 of original

Benchmarks:
- Compression ratio on uniform, realistic, high-variation, and random coverage
- Random access performance via binary search
- Full decompression performance

All tests FAIL as expected (RED phase) - implementation deferred to separate agent.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement run-length encoding (RLE) compression for CoverageTrack with:
- Quantization to nearest 5 (e.g., 7 → 5, 13 → 15)
- RLE encoding: compress runs of identical quantized values as (value, length) pairs
- RLE decoding: decompress to full coverage array for validation
- Random access: coverage_at(pos) returns value via binary search in O(log runs) time
- Construction from Vec<usize> (coverage per position)
- Iterator over (position, coverage) for decompression

All 46 tests pass:
- 14 unit tests covering uniform/alternating/zero-coverage/realistic scenarios
- 5 property tests covering round-trip encode/decode, quantization idempotence, etc.
- Quantization boundary tests (handles 2→0, 3→5, 7→5, 8→10 correctly)
- Serialization via serde (Serialize/Deserialize traits)
- Random access and iterator correctness

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@crashfrog
Copy link
Copy Markdown
Member Author

Closing stale PR. Work has been superseded or merged via alternative approach.

@crashfrog crashfrog closed this May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant