implement CoverageTrack with RLE compression for #62#99
Closed
crashfrog wants to merge 2 commits into
Closed
Conversation
Comprehensive test suite for CoverageTrack RLE compression with quantization: Unit tests (14): - Uniform coverage (single run) - Alternating coverage (many runs) - Zero coverage regions - Quantization to nearest 5 (exact multiples, boundaries, rounding) - Random access via binary search (O(log n)) - Out-of-bounds access - Iterator over positions - Empty coverage edge case - Single position edge case - Realistic bacterial genome (4.6Mbp with variation) - High depth sequencing (100x) - Serialization round-trip Property tests (5): - Round-trip encode/decode equality (modulo quantization) - Quantization idempotence (quantize twice == quantize once) - coverage_at(i) matches to_vec()[i] - All decompressed values are multiples of 5 - Quantized values within ±2 of original Benchmarks: - Compression ratio on uniform, realistic, high-variation, and random coverage - Random access performance via binary search - Full decompression performance All tests FAIL as expected (RED phase) - implementation deferred to separate agent. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement run-length encoding (RLE) compression for CoverageTrack with: - Quantization to nearest 5 (e.g., 7 → 5, 13 → 15) - RLE encoding: compress runs of identical quantized values as (value, length) pairs - RLE decoding: decompress to full coverage array for validation - Random access: coverage_at(pos) returns value via binary search in O(log runs) time - Construction from Vec<usize> (coverage per position) - Iterator over (position, coverage) for decompression All 46 tests pass: - 14 unit tests covering uniform/alternating/zero-coverage/realistic scenarios - 5 property tests covering round-trip encode/decode, quantization idempotence, etc. - Quantization boundary tests (handles 2→0, 3→5, 7→5, 8→10 correctly) - Serialization via serde (Serialize/Deserialize traits) - Random access and iterator correctness Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Member
Author
|
Closing stale PR. Work has been superseded or merged via alternative approach. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implement run-length encoding (RLE) compression for CoverageTrack in phraya-core. Coverage values are quantized to the nearest 5 before encoding to reduce storage overhead while preserving precision for biological applications.
Key Features
coverage_at(pos).phrayafile formatTest Coverage
Acceptance Criteria Met
Test Plan
cargo test -p phraya-core typesto validate all 46 tests passcargo test -p phraya-core coverage_track --libto focus on 14 CoverageTrack testscargo test -p phraya-core property_ --libto validate 5 property testscargo clippy -p phraya-coreto ensure no linting issuescargo fmt -p phraya-core -- --checkto verify formattingGenerated with Claude Code