Skip to content

veff.py: Add NON_SYNONYMOUS_START and NON_SYNONYMOUS_STOP effect classifications #1045

@ZiadXI

Description

@ZiadXI

Problem

Relative to #953

In malariagen_data/veff.py, the _get_within_cds_effect() function currently
classifies all missense SNPs that don't match START_LOST, STOP_LOST, or
STOP_GAINED as NON_SYNONYMOUS_CODING, regardless of whether the mutation
is at a start or stop codon position.

There was already a # TODO NON_SYNONYMOUS_START and NON_SYNONYMOUS_STOP
comment at line 359 acknowledging this gap.

This leads to two missing classifications:

  1. A missense SNP at CDS position 0 (non-canonical start codon) is labeled
    NON_SYNONYMOUS_CODING — should be NON_SYNONYMOUS_START
  2. A SNP where two distinct stop codon triplets are involved is labeled
    NON_SYNONYMOUS_CODING — should be NON_SYNONYMOUS_STOP

Proposed Changes

  • malariagen_data/veff.py: Split the else branch into three cases
  • tests/anoph/test_snp_frq.py: Add new effects to expected_effects allowlist
  • tests/test_veff.py: Add unit tests covering the full SNP effect taxonomy

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions