Skip to content

Reduce N+1 / per-row queries on page loads #1590#1591

Open
davmlaw wants to merge 2 commits into
masterfrom
page_load_n_plus_one_1590
Open

Reduce N+1 / per-row queries on page loads #1590#1591
davmlaw wants to merge 2 commits into
masterfrom
page_load_n_plus_one_1590

Conversation

@davmlaw

@davmlaw davmlaw commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

🤖 Written by Claude

Addresses #1590 — see the issue for full profiling methodology and numbers.

Changes

Query fixes (steady-state)

  • OntologyVersion.latest() fetches all candidate OntologyImports in one query (was one query per import field, called multiple times per gene page via the gene_disease tag).
  • OntologyVersion.get_ontology_imports() returns a lazy QuerySet so __in filters use a subquery (was 5 individual FK lazy-loads, 3× per gene symbol page).
  • related_data_for_samples template tag batches cohort-sample, trio and pedigree lookups and select_relateds trio members, so query count is constant regardless of how many samples/trios are shown (was ~6 queries per rendered trio, plus 3 queries per cohort-sample). Cohort page trio list gets the same treatment.

One-time initialisation

  • load_genome_fasta_index uses bulk_create (was ~640 individual INSERTs). This runs once per genome build per system (lazily, from the first request needing the fasta index) — it dominated test profiles only because test transactions roll back, so it re-ran every time. Not a steady-state page cost.

Tooling

  • URLTestCase gains opt-in profiling: VG_QUERY_PROFILE=<file> records per-URL query counts, request/SQL ms and duplicate-query groups; VG_QUERY_TRACE=<sql regex> records stack traces for matching queries. Inactive unless the env vars are set.

Regression + scaling tests

  • snpdb/tests/test_query_counts.py — locks the related-data tag at 4 queries and trio rendering at 0; view_sample query count flat at 1 vs 11 related trios.
  • classification/tests/views/test_query_scaling.py — classification datatable query count flat at 2 vs 10 rows.
  • ontology/tests/test_query_counts.py — locks latest() at 2 queries, get_ontology_imports() at 0/1.
  • snpdb/tests/test_fasta_index.py — exercises the rebuilt bulk-create path.

Headline numbers (test fixtures)

Steady-state: view_gene_symbol 67 → 49 queries, ontology gene_list API 16 → 7, view_sample/view_vcf 61 → 54 (and now constant as trios grow — the per-trio cost scales with real data).

First-hit-per-system initialisation (now bulk): view_allele 719 → 81 queries / 649 → 386 ms, view_transcript 318 → 22 / 206 → 69 ms; steady-state for these two pages is unchanged.

All 71 URL tests pass, plus annotation/genes/pedigree/ontology suites (the two TestAnnotationVCFCNV4 failures are pre-existing on master — verified via stash).

🤖 Generated with Claude Code

davmlaw added 2 commits June 11, 2026 07:12
- OntologyVersion.latest(): one query for all candidate imports instead of one per field
- OntologyVersion.get_ontology_imports(): lazy QuerySet (subquery in __in filters) instead of 5 FK lazy-loads
- related_data_for_samples: batch cohort sample / trio / pedigree queries, select_related trio members
- load_genome_fasta_index: bulk_create GenomeFastaContig rows
- URLTestCase: opt-in query profiling via VG_QUERY_PROFILE / VG_QUERY_TRACE
- Query-count regression tests
Page/grid query counts must stay flat as row counts grow - catches per-row
N+1 patterns that small fixtures hide. production_query_count() excludes
lookups on models whose object managers cache in production but not under
UNIT_TEST, so the tests match production behaviour.

- view_sample: flat with 1 vs 11 related trios
- classification datatable: flat with 2 vs 10 classification rows
@davmlaw

davmlaw commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Written by Claude

Added query-count scaling tests (f5715f8): load a page, multiply the row count ~10×, assert the production-relevant query count stays flat. This catches the per-row N+1 class that the small test_urls fixtures hide (a 6-queries-per-trio pattern looks like a harmless ×2 with 2 rows). production_query_count() excludes lookups on models cached by ObjectManagerCachingImmutable/Request in production (caching is disabled under UNIT_TEST), so the assertions match production behaviour.

  • view_sample: flat at 1 vs 11 related trios
  • classification datatable: flat at 2 vs 10 classification rows (passed first try — row data comes via .values(), renderers are clean)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant