Greetings CAMI team,
Thank you for creating and sharing this invaluable resource and initiative!
I am excited to use the rhizosphere database from the second challenge to benchmark a metagenomics pipeline.
I understand we can compare our pipeline results (such as MAGs) to the input ref files used to create the simulated datasets, such as https://frl.publisso.de/data/frl:6425521/plant_associated/rhimgCAMI2_genomes.tar.gz.
We can also compare our results to the gold standard, as described on the CAMI challenge website and the CAMI publications such as https://www.nature.com/articles/nmeth.4458
I wanted to understand better how the gold standard was created, but I am unable to find that aspect of the Methods.
I found this information: “The gold standard includes all genomic regions covered by at least one read in the metagenome data set.”
Is there also a description of the parameters and software used to create the gold standard files (contigs.tar.gz, for example)? I imagine that depending on the assembly software and parameters used to create the gold standard contigs, for example, the "genomic regions covered by at least one read" could vary quite a bit.
Thanks very much for any info you could provide!
Greetings CAMI team,
Thank you for creating and sharing this invaluable resource and initiative!
I am excited to use the rhizosphere database from the second challenge to benchmark a metagenomics pipeline.
I understand we can compare our pipeline results (such as MAGs) to the input ref files used to create the simulated datasets, such as https://frl.publisso.de/data/frl:6425521/plant_associated/rhimgCAMI2_genomes.tar.gz.
We can also compare our results to the gold standard, as described on the CAMI challenge website and the CAMI publications such as https://www.nature.com/articles/nmeth.4458
I wanted to understand better how the gold standard was created, but I am unable to find that aspect of the Methods.
I found this information: “The gold standard includes all genomic regions covered by at least one read in the metagenome data set.”
Is there also a description of the parameters and software used to create the gold standard files (contigs.tar.gz, for example)? I imagine that depending on the assembly software and parameters used to create the gold standard contigs, for example, the "genomic regions covered by at least one read" could vary quite a bit.
Thanks very much for any info you could provide!