Skip to content

feat: register protein sequence consequence/canonical allele catvars#470

Merged
jsstevenson merged 1 commit into
mainfrom
feat/453-register-catvars
Jun 24, 2026
Merged

feat: register protein sequence consequence/canonical allele catvars#470
jsstevenson merged 1 commit into
mainfrom
feat/453-register-catvars

Conversation

@jsstevenson

@jsstevenson jsstevenson commented May 28, 2026

Copy link
Copy Markdown
Contributor

close #453

Decisions

  • /categorical_variants not /categorical_variations. I don't even know which is correct at this point
  • the "reference" to the external resource is just the ID of the catvar itself. So rather than making a catvar with an "xref" to a clingen allele record, the id field is just clingen.allele:CA12345. And similarly, fetching that specific catvar is GET /categorical_variants/canonical_alleles/clingen.allele:CA12345
  • Rather than a general "drop in a catvar, have some indication of its type, let anyvar figure it out" endpoint, I just made resource type-specific registration endpoints, I think that's better for our sanity and avoids inventing any new conventions. Similarly, there are separate sql tables for each categorical variant type. I had wanted to be slick but it probably makes more sense to just use a new table for each type.
  • Make new classes for PSQ's/CA's instead of reusing ga4gh.cat_vrs recipes, a couple reasons
    1. we want to be stricter about some validation things (ID is required, len(constraints) == 1)
    2. we might want to interpret relations differently than the recipes do, now or in the future
  • Validating molecule type is done outside of the pydantic field_validator because it might require seqrepo access. Alternatives here could be 1) requiring the submitter to explicitly declare molecule type and trusting it, or 2) injecting validation context into the fastapi route validation logic, but that looked like it was going to be insane. It's also possible that this function will duplicate work from the projection end point so whichever one gets merged second can resolve that.
  • Request paths below -- also add basic GET by ID for round-trip testing and also because it just seems reasonable and enabled roundtrip testing
  • Haven't added any relations to the input or output objects yet, pending final determination on how we implement our search functions. But broadly, I think you could just make sure to add in the right values when the object is coming back out to the user. Not sure if it's worth validating them on the way in.
  • Haven't updated docs yet, pending final search endpoint behavior
  • Didn't add a Snowflake implementation because a) scary and b) maybe we are ending snowflake support?

Routes

PUT /categorical_variants/canonical_alleles -- to register a CA
GET /categorical_variants/canonical_alleles/<id> -- to fetch a CA by ID
PUT /categorical_variants/protein_sequence_consequences -- to register a PSQ
GET /categorical_variants/protein_sequence_consequences/<id> -- to fetch a PSQ by ID

@jsstevenson jsstevenson force-pushed the feat/453-register-catvars branch from ca48aec to ec123d5 Compare June 4, 2026 15:26
@jsstevenson jsstevenson marked this pull request as ready for review June 4, 2026 15:28
@jsstevenson jsstevenson requested a review from a team as a code owner June 4, 2026 15:28
Comment thread src/anyvar/core/categorical_variants.py
Comment thread src/anyvar/core/categorical_variants.py Outdated
Comment thread tests/integration/restapi/vcf_router/test_ingest_vcf.py Outdated
@jsstevenson

Copy link
Copy Markdown
Contributor Author

pulled in latest commits to main + added more tests

Comment on lines +39 to +41
sequenceReference=vrs.SequenceReference(
refgetAccession="SQ.cQvw4UsHHRRlogxbWCB8W-mKD4AraM9y",
),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a moleculeType="protein" or moleculeType=vrs.MoleculeType.PROTEIN here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I meant to do this before, but also updated the molecule type validation function to trust moleculeType blindly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add more around this if we do #478

Comment thread src/anyvar/storage/sqlalchemy.py Outdated
Comment thread pyproject.toml Outdated
@jsstevenson jsstevenson requested a review from theferrit32 June 23, 2026 18:56
@jsstevenson jsstevenson force-pushed the feat/453-register-catvars branch from 79cab79 to 33ae7d7 Compare June 24, 2026 13:46
@jsstevenson jsstevenson force-pushed the feat/453-register-catvars branch from 33ae7d7 to b93be23 Compare June 24, 2026 13:48
@jsstevenson jsstevenson merged commit 8f6f797 into main Jun 24, 2026
17 checks passed
@jsstevenson jsstevenson deleted the feat/453-register-catvars branch June 24, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build registration endpoint for ProteinSequenceConsequence and CanonicalAllele categorical variations

3 participants