Skip to content

perf(datasource-active-record): JOIN same-DB to-one relations and select only projected columns#323

Merged
PMerlet merged 10 commits into
mainfrom
perf/ar-datasource-join-to-one-same-db
Jul 2, 2026
Merged

perf(datasource-active-record): JOIN same-DB to-one relations and select only projected columns#323
PMerlet merged 10 commits into
mainfrom
perf/ar-datasource-join-to-one-same-db

Conversation

@PMerlet

@PMerlet PMerlet commented Jul 1, 2026

Copy link
Copy Markdown
Member

Context

When rendering a record (or a list) through the ActiveRecord datasource, every relation in the projection is resolved with .includes → ActiveRecord preload (one extra query per relation hop). Displaying a belongs_to chain such as income → bank_account → organization therefore issues 3 sequential queries (a round-trip waterfall). And because relations are preloaded (or, with eager_load, joined), the related rows are always read in full (SELECT relation.*) — costly when the relation is a wide table or a heavy DB view.

Change

Utils::Query#apply_select now splits the projection's relations:

  • to-one relations on the same database (belongs_to / has_one, non-polymorphic) are resolved with a single LEFT OUTER JOIN, selecting only the projected columns of each joined table (aliased on the flat row), plus the target primary key for NULL detection. The serializer rebuilds the nested hash from those aliases instead of reading the ActiveRecord association.
  • everything else keeps its current preload behaviour and association-based serialization.

Measured effect

income → bank_account → organization, displaying only organization.name:

Before — 3 sequential queries, each SELECT relation.*:

SELECT incomes.id, incomes.bank_account_id FROM incomes WHERE id = $1;
SELECT bank_accounts.*      FROM bank_accounts      WHERE id = $1;
SELECT organizations_view.* FROM organizations_view WHERE id = $1;  -- ~150 columns of a view

After — 1 query, only the projected column of the view:

SELECT incomes.id, incomes.bank_account_id,
       bank_accounts.id        AS fa_join_1,
       organizations_view.name AS fa_join_2,
       organizations_view.id   AS fa_join_3
FROM incomes
LEFT OUTER JOIN bank_accounts     ON bank_accounts.id = incomes.bank_account_id
LEFT OUTER JOIN organizations_view ON organizations_view.id = bank_accounts.organizations_view_id
WHERE incomes.id = $1;

On a list, the query count stays constant regardless of the number of rows.

Safety — nothing is JOINed unless it is provably safe

fully_joinable? walks the whole relation subtree and falls back to preload if any of these do not hold:

  • to-many relations → preload (a JOIN multiplies rows and breaks pagination);
  • polymorphic to-one → preload (target table varies per row);
  • different database connection (connects_to) → preload (cannot JOIN across connections);
  • default_scope on the target → preload (may inject unqualifiable raw SQL, e.g. where('id > ?', 10), that becomes ambiguous once joined);
  • target not resolvable as an AR-backed collection belonging to this very datasource → preload (defensive against cross-datasource / name-collision cases — checks the concrete class and the datasource object identity, not just the name).

Tests

New spec utils/join_to_one_optimization_spec.rb:

  • a two-hop chain resolves in a single JOINed query (was 3 with preload);
  • the JOIN selects only the projected columns of the joined tables (not table.*);
  • query count stays constant regardless of row count;
  • each guard (default_scope, absent collection, non-AR target, foreign datasource) falls back to preload safely;
  • to-many relations still preload.

query_spec.rb updated to reflect that nested to-one relations are now JOINed.

Full forest_admin_datasource_active_record suite: 156 examples, 0 failures. RuboCop clean.

🤖 Generated with Claude Code

Note

JOIN same-database belongs_to relations and select only projected columns in ActiveRecord datasource queries

  • Utils::Query now splits belongs_to relations into two groups: eligible ones (same DB, no scopes, no duplicate tables) are LEFT OUTER JOINed with aliased column selects; the rest are preloaded via includes.
  • Utils::ActiveRecordSerializer accepts a joined_relations map and hydrates JOINed relations from aliased columns on the root row instead of traversing preloaded AR associations.
  • Filter/sort JOINs are tracked in @filter_joined_tables to prevent duplicate conflicting JOINs when apply_select runs.
  • Nested JOINable relations are supported recursively.
  • Behavioral Change: queries that previously issued N+1 preloads for eligible belongs_to relations now issue a single JOIN query with an explicit SELECT list, changing the SQL shape and column count of results.

Changes since #323 opened

  • Modified ForestAdminDatasourceActiveRecord::Utils::ActiveRecordSerializer.hash_object to conditionally serialize related records with only projected columns instead of full attributes, and introduced ForestAdminDatasourceActiveRecord::Utils::ActiveRecordSerializer.projected_columns helper method [6115cfa]
  • Added test coverage verifying that related records serialized after JOIN or preload operations expose exactly the projected columns [6115cfa]
  • Updated inline documentation and comments in ForestAdminDatasourceActiveRecord::Utils::ActiveRecordSerializer and ForestAdminDatasourceActiveRecord::Utils::Query [0a762a6]
  • Changed ForestAdminDatasourceActiveRecord::Utils::ActiveRecordSerializer.join_aliases private method to use array deduplication instead of Set conversion [43e0e92]

Macroscope summarized ac04d57.

… of preload

Resolve non-polymorphic to-one relation chains (belongs_to / has_one) that live
on the same database with a single LEFT OUTER JOIN (eager_load) instead of one
extra preload query per relation hop. Collapses the round-trip cascade on record
show and keeps a constant query count on lists.

Guards preserve existing behaviour wherever a JOIN would be unsafe or impossible:
- to-many relations keep preload (a JOIN multiplies rows and breaks pagination)
- polymorphic to-one relations keep preload (target table varies per row)
- targets on a different database connection keep preload (connects_to)
- targets carrying a default_scope keep preload (may inject unqualifiable SQL,
  e.g. `where('id > ?', 10)`, that becomes ambiguous once joined)
- targets that cannot be resolved as an AR-backed collection belonging to this
  very datasource instance keep preload (defensive against cross-datasource /
  name-collision cases): checks concrete class + datasource object identity

Adds a spec proving a two-hop chain collapses to one JOINed query, query count
stays constant regardless of row count, and every guard falls back safely.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@qltysh

qltysh Bot commented Jul 1, 2026

Copy link
Copy Markdown

9 new issues

Tool Category Rule Count
qlty Structure Function with high complexity (count = 9): serialize_associations 4
qlty Structure Function with many parameters (count = 4): serialize_associations 3
qlty Structure High total complexity (count = 78) 1
qlty Structure Function with many returns (count = 5): joinable_target 1

@query = @query.select(@select.join(', ')) if @select
@query = @query.includes(format_relation_projection(@projection)) unless @projection.nil?

@query

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function with high complexity (count = 10): apply_select [qlty:function-complexity]

…ed to-one relations

Builds on the JOIN optimization: instead of eager_load (which forces `table.*` for
every joined association), to-one relations are now resolved with left_outer_joins +
an explicit SELECT of ONLY their projected columns, aliased on the flat row. The
serializer rebuilds the nested hash from those aliases (and detects NULL relations via
the target primary key) rather than reading the ActiveRecord association.

Effect: displaying a single field of a wide relation (e.g. a heavy DB view with ~150
columns) reads exactly that column plus the join keys, in one query — no full-row read
and no per-hop round-trip. to-many / guarded relations keep their preload path and the
existing association-based serialization untouched.

156 examples, 0 failures; rubocop clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@PMerlet PMerlet changed the title perf(datasource-active-record): JOIN same-DB to-one relations instead of preload perf(datasource-active-record): JOIN same-DB to-one relations and select only projected columns Jul 2, 2026
object.attributes.except(*join_aliases)
end

def serialize_associations(object, projection, hash, path)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function with many parameters (count = 4): serialize_associations [qlty:function-parameters]

hash_object(item, projection.relations[association_name], path: relation_path)
end
end
end

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function with high complexity (count = 9): serialize_associations [qlty:function-complexity]

# joined to-one relation (recursively), and records the aliases in @joined_relations
# so the serializer can rebuild the nested hash from the flat row. The target primary
# key is always selected to let the serializer detect a NULL (absent) relation.
def collect_joined_selects(collection, relation_name, sub_projection, path)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function with many parameters (count = 4): collect_joined_selects [qlty:function-parameters]

…le only

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…acroscope review)

- Only belongs_to (ManyToOne) relations are JOINed now. has_one (OneToOne) does not
  guarantee a unique child row, so a JOIN could duplicate the parent and break list
  results / pagination; has_one falls back to preload.
- Never JOIN a table already present in the query (base or a sibling/nested join).
  ActiveRecord would alias a table joined twice, and collect_joined_selects references
  the plain table name; such relations fall back to preload instead.

joinable_tables replaces fully_joinable?: it returns the set of tables a subtree would
add via JOIN (or nil), threading the used-tables set so collisions are detected across
the whole query.

Specs cover: belongs_to collapses to one JOIN with only projected columns and constant
query count; has_one, to-many, default_scope, already-used table, and non-local targets
all fall back to preload.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
# primary key, so the JOIN cannot duplicate the parent row (a has_one child may not be
# unique). used_tables already covers the base + sibling joins: a table joined twice would
# be aliased by ActiveRecord, which collect_joined_selects cannot reference — so bail out.
def joinable_tables(collection, relation_name, sub_projection, used_tables)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function with many parameters (count = 4): joinable_tables [qlty:function-parameters]

…ed selects (Macroscope review)

target.model.primary_key returns an array for composite-key models; iterate Array(pk)
so each key column gets its own aliased select instead of embedding the array as a
single, invalid column reference. NULL detection uses the first key column's alias.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ting, real pool check (Macroscope review)

- Do not add a second JOIN for a relation already joined by a filter/sort: resolve_field
  records the joined table in @filter_joined_tables, and apply_select seeds used_tables
  with it so joinable_tables rejects it (falls back to preload).
- Quote joined-select identifiers via the adapter (quote_table_name / quote_column_name)
  instead of ANSI double quotes, which are string literals on MySQL's default sql_mode.
- same_database? compares connection_pool instead of connection_specification_name, which
  is only the owner class name and can be shared across different databases/shards.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…connection & quoting (Macroscope review)

- Fall back to preload when the belongs_to itself carries a scope (e.g.
  `belongs_to :x, -> { where('id > ?', 1) }`): the scope is applied to the JOIN and can
  inject raw/unqualified SQL or extra joins. joinable_target now also checks the
  association reflection's scope, not only the target model's default_scope.
- Obtain the connection via connection_pool.with_connection (connection is deprecated on
  Rails 8) and quote joined-select identifiers through the adapter.
- Split guards into joinable_target and the relation partitioning into split_relations for
  clarity; drop the always-true with_associations parameter.

161 examples, 0 failures; rubocop clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

tables |= nested
end
tables

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function with high complexity (count = 5): joinable_tables [qlty:function-complexity]

return nil if object[meta[:pk_alias]].nil?

hash = {}
projection.columns.each { |column| hash[column] = object[meta[:columns][column]] }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joined and preloaded to-one relations now return different column sets for the same projection.

A preloaded to-one is still hydrated from the full row (base_attributes -> object.attributes, the pre-PR behavior), whereas a JOINed one is built here from only projection.columns (+ the pk). So the shape of a nested to-one hash now depends on how the relation happened to be resolved.

Concretely: if a consumer reads a column off a to-one relation that isn't in the projection (relying on the old over-fetch), it's present when the relation falls back to preload (default_scope / duplicate table / cross-db) and nil when it JOINs.

The projected-subset behavior is arguably the more correct one — but the divergence is latent and adapter/schema-dependent. Worth either making the preload path match (serialize only projected columns there too) or confirming that every consumer only reads projected columns.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, aligned in 6115cfa. The preload path now serializes a related record to exactly its projected columns too (via projected_columns), matching the JOINed hydration — so a to-one's shape no longer depends on how it was resolved. The root record still keeps all its selected columns (own attributes + FKs); only related records are projection-restricted. Added specs asserting the same column set for a JOINed (account → supplier) and a preloaded (supplier → account) to-one. Thanks for the review!

bexchauveto
bexchauveto previously approved these changes Jul 2, 2026
…r projected columns (review: bexchauveto)

Previously a preloaded to-one was hydrated from its full row while a JOINed one was built
from only the projected columns, so a related record's shape depended on how it was
resolved (preload fallback vs JOIN). Now the preload path is restricted to the projected
columns too, matching the JOINed hydration. The root record still exposes all its selected
columns (own attributes + foreign keys); only related records are projection-restricted.

Adds specs asserting a to-one relation returns exactly the projected columns whether it is
JOINed (account -> supplier) or preloaded (supplier -> account).

163 examples, 0 failures; rubocop clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…le only

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
return unless same_database?(collection.model, target.model)
return if used_tables.include?(target.model.table_name) # a table joined twice would be aliased by AR

target

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 2 issues:

1. Function with many returns (count = 5): joinable_target [qlty:return-statements]


2. Function with high complexity (count = 7): joinable_target [qlty:function-complexity]

…roscope review)

join_aliases used Enumerable#to_set without requiring 'set', which raises NoMethodError on
Ruby 3.0/3.1 for any serialized record. Use uniq (an Array) instead — the value is only
splatted into except(*...) and checked with empty?, so a Set is not needed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@PMerlet PMerlet merged commit 4b121ec into main Jul 2, 2026
48 checks passed
@PMerlet PMerlet deleted the perf/ar-datasource-join-to-one-same-db branch July 2, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants