Skip to content

Mean + median read length #203

Description

@ewels

FastQC reports a range for read length in Basic Statistics, but it's a common ask to know the mean + median read lengths too. MultiQC tries its best to estimate this, but it'd be better if FastQC could calculate the accurate numbers.

History / context

Back in 2022, @jchorl added some code to the FastQC module in MultiQC that attempts to calculate the median read length.

This works relatively well in most cases, but includes a bit of a fudge:

# if the distribution-entry is a range, we use the average of the range.
# this isn't technically correct, because we can't know what the distribution
# is within that range. Probably good enough though.

I suggested that we try to raise this upstream and get FastQC to generate the metrics as it has the required numbers, but I don't think it ever made it.

Anyway, 4 years later I'm getting reports from folks ('cc @jfy133) that maybe the approach isn't really good enough and it's "generating crazy numbers that do not correspond to anything". I figure better late than never - maybe we can look into getting FastQC to report this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions