FastQC reports a range for read length in Basic Statistics, but it's a common ask to know the mean + median read lengths too. MultiQC tries its best to estimate this, but it'd be better if FastQC could calculate the accurate numbers.
History / context
Back in 2022, @jchorl added some code to the FastQC module in MultiQC that attempts to calculate the median read length.
This works relatively well in most cases, but includes a bit of a fudge:
# if the distribution-entry is a range, we use the average of the range.
# this isn't technically correct, because we can't know what the distribution
# is within that range. Probably good enough though.
I suggested that we try to raise this upstream and get FastQC to generate the metrics as it has the required numbers, but I don't think it ever made it.
Anyway, 4 years later I'm getting reports from folks ('cc @jfy133) that maybe the approach isn't really good enough and it's "generating crazy numbers that do not correspond to anything". I figure better late than never - maybe we can look into getting FastQC to report this.
FastQC reports a range for read length in Basic Statistics, but it's a common ask to know the mean + median read lengths too. MultiQC tries its best to estimate this, but it'd be better if FastQC could calculate the accurate numbers.
History / context
Back in 2022, @jchorl added some code to the FastQC module in MultiQC that attempts to calculate the median read length.
This works relatively well in most cases, but includes a bit of a fudge:
I suggested that we try to raise this upstream and get FastQC to generate the metrics as it has the required numbers, but I don't think it ever made it.
Anyway, 4 years later I'm getting reports from folks ('cc @jfy133) that maybe the approach isn't really good enough and it's "generating crazy numbers that do not correspond to anything". I figure better late than never - maybe we can look into getting FastQC to report this.