Mean + median read length

FastQC reports a range for read length in _Basic Statistics_, but it's a common ask to know the mean + median read lengths too. MultiQC tries its best to estimate this, but it'd be better if FastQC could calculate the accurate numbers.

### History / context

Back in 2022, @jchorl added [some code](https://github.com/MultiQC/MultiQC/blob/00d7f33df8cec2be307454119b2a8714e33a94f9/multiqc/modules/fastqc/fastqc.py#L417-L435) to the FastQC module in MultiQC that attempts to calculate the median read length.

This works relatively well in most cases, but includes a bit of a fudge:

```python
# if the distribution-entry is a range, we use the average of the range.
# this isn't technically correct, because we can't know what the distribution
# is within that range. Probably good enough though.
```

I [suggested](https://github.com/MultiQC/MultiQC/pull/1745/changes#r1049840721) that we try to raise this upstream and get FastQC to generate the metrics as it has the required numbers, but I don't think it ever made it.

Anyway, 4 years later I'm getting reports from folks ('cc @jfy133) that maybe the approach _isn't_ really good enough and it's "generating crazy numbers that do not correspond to anything". I figure better late than never - maybe we can look into getting FastQC to report this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mean + median read length #203

History / context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Mean + median read length #203

Description

History / context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions