Reduce N+1 / per-row queries on page loads #1590#1591
Open
davmlaw wants to merge 2 commits into
Open
Conversation
- OntologyVersion.latest(): one query for all candidate imports instead of one per field - OntologyVersion.get_ontology_imports(): lazy QuerySet (subquery in __in filters) instead of 5 FK lazy-loads - related_data_for_samples: batch cohort sample / trio / pedigree queries, select_related trio members - load_genome_fasta_index: bulk_create GenomeFastaContig rows - URLTestCase: opt-in query profiling via VG_QUERY_PROFILE / VG_QUERY_TRACE - Query-count regression tests
Page/grid query counts must stay flat as row counts grow - catches per-row N+1 patterns that small fixtures hide. production_query_count() excludes lookups on models whose object managers cache in production but not under UNIT_TEST, so the tests match production behaviour. - view_sample: flat with 1 vs 11 related trios - classification datatable: flat with 2 vs 10 classification rows
Contributor
Author
|
🤖 Written by Claude Added query-count scaling tests (f5715f8): load a page, multiply the row count ~10×, assert the production-relevant query count stays flat. This catches the per-row N+1 class that the small
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 Written by Claude
Addresses #1590 — see the issue for full profiling methodology and numbers.
Changes
Query fixes (steady-state)
OntologyVersion.latest()fetches all candidateOntologyImports in one query (was one query per import field, called multiple times per gene page via thegene_diseasetag).OntologyVersion.get_ontology_imports()returns a lazy QuerySet so__infilters use a subquery (was 5 individual FK lazy-loads, 3× per gene symbol page).related_data_for_samplestemplate tag batches cohort-sample, trio and pedigree lookups andselect_relateds trio members, so query count is constant regardless of how many samples/trios are shown (was ~6 queries per rendered trio, plus 3 queries per cohort-sample). Cohort page trio list gets the same treatment.One-time initialisation
load_genome_fasta_indexusesbulk_create(was ~640 individual INSERTs). This runs once per genome build per system (lazily, from the first request needing the fasta index) — it dominated test profiles only because test transactions roll back, so it re-ran every time. Not a steady-state page cost.Tooling
URLTestCasegains opt-in profiling:VG_QUERY_PROFILE=<file>records per-URL query counts, request/SQL ms and duplicate-query groups;VG_QUERY_TRACE=<sql regex>records stack traces for matching queries. Inactive unless the env vars are set.Regression + scaling tests
snpdb/tests/test_query_counts.py— locks the related-data tag at 4 queries and trio rendering at 0;view_samplequery count flat at 1 vs 11 related trios.classification/tests/views/test_query_scaling.py— classification datatable query count flat at 2 vs 10 rows.ontology/tests/test_query_counts.py— lockslatest()at 2 queries,get_ontology_imports()at 0/1.snpdb/tests/test_fasta_index.py— exercises the rebuilt bulk-create path.Headline numbers (test fixtures)
Steady-state:
view_gene_symbol67 → 49 queries, ontology gene_list API 16 → 7,view_sample/view_vcf61 → 54 (and now constant as trios grow — the per-trio cost scales with real data).First-hit-per-system initialisation (now bulk):
view_allele719 → 81 queries / 649 → 386 ms,view_transcript318 → 22 / 206 → 69 ms; steady-state for these two pages is unchanged.All 71 URL tests pass, plus annotation/genes/pedigree/ontology suites (the two
TestAnnotationVCFCNV4failures are pre-existing on master — verified via stash).🤖 Generated with Claude Code