Problem
Relative to #953
In malariagen_data/veff.py, the _get_within_cds_effect() function currently
classifies all missense SNPs that don't match START_LOST, STOP_LOST, or
STOP_GAINED as NON_SYNONYMOUS_CODING, regardless of whether the mutation
is at a start or stop codon position.
There was already a # TODO NON_SYNONYMOUS_START and NON_SYNONYMOUS_STOP
comment at line 359 acknowledging this gap.
This leads to two missing classifications:
- A missense SNP at CDS position 0 (non-canonical start codon) is labeled
NON_SYNONYMOUS_CODING — should be NON_SYNONYMOUS_START
- A SNP where two distinct stop codon triplets are involved is labeled
NON_SYNONYMOUS_CODING — should be NON_SYNONYMOUS_STOP
Proposed Changes
- malariagen_data/veff.py: Split the
else branch into three cases
- tests/anoph/test_snp_frq.py: Add new effects to
expected_effects allowlist
- tests/test_veff.py: Add unit tests covering the full SNP effect taxonomy