Skip to content

Allow per phenotype p-value thresholds #87

@edg1983

Description

@edg1983

The idea

Currently, the p-value thresholds for SNPs selection and locus breaker are configured per study directly in the summary stat input table (p_thresh1 and p_thresh2 columns). Thus, the same thresholds are applied to all phenotypes for a given study.

However, this is not ideal for molecular traits (like eQTLs or pQTLs results) where one may prefer to set different thresholds for each gene / protein.

A possible implementation

  • We allow an additional optional input column in the summary stat input table, like p_thresh_table. This must point to a TSV table with columns: study_id, pheno_id (aka gene_id), p_thresh1, p_thresh2
  • We update the locus-breaker R script to accept an extra argument --p_thresh_table.
  • When we load the summary stats, we add 2 columns representing the thresholds. These are populated with fixed values from --p_thres1 and --p_thres2 then --p_thresh_table is not set, otherwise we use a merge by (study_id, pheno_id) to populate them with information from the p threshold table
  • SNPs are filtered based on the p_thresh columns (and then these can be removed eventually)

We have to point out in the docs that when the p-value threshold table is provided, this will take precedence over the fixed values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions