Add mean and median sequence length to Basic Statistics (#203)#204
Closed
ewels wants to merge 1 commit into
Closed
Add mean and median sequence length to Basic Statistics (#203)#204ewels wants to merge 1 commit into
ewels wants to merge 1 commit into
Conversation
Implements s-andrews#203. The Basic Statistics module currently reports only the range of read lengths; this adds "Mean sequence length" and "Median sequence length" rows so users (and downstream tools such as MultiQC) get accurate values straight from the source rather than estimating them. The mean is total bases / total sequences (2 d.p.); the median is derived from a per-length histogram over non-filtered sequences (for an even count, the two central values averaged and rounded up). Both are added to the ResultsTable model, so they appear in the interactive results panel, the HTML report, and fastqc_data.txt together. Integration test approved files updated accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Hah, you beat me to it in 54b336e 😆 Comparing implementations now.. |
Contributor
Author
|
Closing to build on what you pushed already instead. See #205 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #203.
The Basic Statistics module currently reports only the range of read lengths (
min-max). This adds two rows so users get accurate mean and median read lengths straight from the source, rather than having downstream tools like MultiQC estimate them from the binned length distribution:Implementation
Locale.ROOT, so the decimal separator is locale-independent).ResultsTablemodel, so they appear consistently in the interactive results panel, the HTML report, andfastqc_data.txt.The histogram grows on demand using the same idiom as
SequenceLengthDistribution.Testing
antand ran the compiled FastQC on theminimalandcomplextest files; confirmed the new rows appear correctly in bothfastqc_data.txtand the HTML report.FileContentsTestapproved files (data + HTML, forminimalandcomplex). The HTML snapshots add only the two new table rows; the embedded chart images are unchanged.🤖 Generated with Claude Code