Skip to content

Feature extractors #82

@crashfrog

Description

@crashfrog

Parent

#58 (Phase 1 MVP: Evidence-Informed Alignment with Deferred Filtering)

What to build

Implement derived metric computation in phraya-filter for features not directly stored in VariantObservation. Computes: cigar_ops (count of CIGAR operations for complexity), allele_frequency (from all_alleles counts), multi_map_fraction (requires loading .phraya.queries sidecar).

These enable filtering on computed properties like "CIGAR complexity > 10" or "allele frequency < 0.05".

Acceptance criteria

  • extract_cigar_ops(cigar: &str) → usize (count M/I/D operations)
  • extract_allele_frequency(all_alleles: &HashMap<u8, usize>, allele: u8) → f64
  • extract_multi_map_fraction(position: usize, query_index: &QueryIndex) → f64
  • Tests: CIGAR "50M" → 1 op, "10M5I10M5D25M" → 5 ops
  • Tests: allele_frequency with counts {A:90, C:10} → 0.9 for A, 0.1 for C
  • Tests: multi_map_fraction at position with 10 reads, 3 multi-mappers → 0.3
  • Unit test: empty all_alleles → handle gracefully
  • Unit test: query not in query_index → return 0.0 (assume unique mapping)

Blocked by

#81 (filter library), #77 (.queries format for multi-map)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions