Skip to content

Add operation-based indexing pipeline#798

Open
Morriar wants to merge 9 commits intomainfrom
at-index-operations
Open

Add operation-based indexing pipeline#798
Morriar wants to merge 9 commits intomainfrom
at-index-operations

Conversation

@Morriar
Copy link
Copy Markdown
Contributor

@Morriar Morriar commented May 8, 2026

What

Introduces an intermediate representation (IR) between Ruby parsing and graph construction. Instead of the indexer writing directly to a LocalGraph, a new pipeline produces an ordered Vec<Operation> (an explicit list of what each file contributes to the graph), then applies those operations to build the graph.

Current pipeline:

graph LR
    AST --> RubyIndexer --> LocalGraph --> Graph --> Resolution
Loading

New pipeline:

graph LR
    AST --> RubyOperationBuilder --> Operations["Vec&lt;Operation&gt;"] --> OperationApplier --> LocalGraph --> Graph --> Resolution
Loading

(This is an intermediate step to ensure parity with the current indexer / local graph and reduce the scope of this PR.)

Target pipeline:

graph LR
    AST --> Operations["Vec&lt;Operation&gt;"] --> Graph["Graph (resolution applies operations directly)"]
Loading

Once we commit to this direction, we can remove the LocalGraph and feed the operations directly to the resolver to let it create both the definitions and the declarations.

Why

Four motivations that reinforce each other:

DSL plugin API

Plugins need to contribute definitions to the graph. Without operations, plugins mutate LocalGraph internals: fragile, tightly coupled, and bad for parallelism (Ruby plugins calling into the graph would require synchronization).

With operations, a plugin just produces Vec<Operation>, a pure data structure that can be built in parallel and applied later. Operations become the public contract between Rubydex and plugins.

class Post < ApplicationRecord
  has_many :comments
end

An ActiveRecord plugin sees has_many :comments and emits operations. It doesn't touch graph internals, it speaks the same operation protocol as the built-in indexer.

EnterMethod(comments)
ExitScope
EnterMethod(comments=)
ExitScope
EnterMethod(comment_ids)
ExitScope
EnterMethod(comment_ids=)
ExitScope

Unloading

Each file's contribution is its operation list. Negating the operations (walk in reverse, opposite effect) cleanly unloads the file without graph diffing.

# user.rb produces:
# 1. EnterClass(User)
# 2.   EnterMethod(name)
# 3.   ExitScope
# 4.   EnterMethod(email)
# 5.   ExitScope
# 6. ExitScope

# To unload user.rb: negate in reverse order
# 6. (no-op)
# 5. (no-op)
# 4. RemoveMethod(email)
# 3. (no-op)
# 2. RemoveMethod(name)
# 1. RemoveClass(User)

Because DSL plugins produce the same IR primitives as the built-in indexer, we know how to undo their contributions for free: no plugin-specific cleanup logic, no tracking which definitions in the graph came from which plugin.

Incremental re-indexing

Because operations are a flat list, we can diff the old and new operation lists when a file changes. Only the delta gets applied: remove operations that disappeared, add new ones. Most edits change very few operations, so the diff is small.

# user.rb: rename `name` to `full_name`

Old operations:                    New operations:
  EnterClass(User)                   EnterClass(User)
    EnterMethod(name)          →       EnterMethod(full_name)
    ExitScope                          ExitScope
    EnterMethod(email)                 EnterMethod(email)
    ExitScope                          ExitScope
  ExitScope                          ExitScope

# Diff: remove EnterMethod(name), add EnterMethod(full_name)

No full rebuild, no negating the entire file, just the minimal set of changes.

Order-dependent diagnostics

Operations preserve source order, which the current graph doesn't. Consider:

class Foo
  private_constant :BAR
  BAR = 42
end

Ruby raises NameError because private_constant :BAR runs before BAR exists. Today Rubydex can't catch this because both the constant and the visibility change become unordered definitions in the graph, and resolution wires them together regardless of order. With operations, the ordering is explicit (SetConstantVisibility before DefineConstant), and a diagnostic pass can detect the forward reference.

How

Operation enum (operation/mod.rs): ~20 variants covering namespace scopes (EnterClass, EnterModule, EnterSingletonClass, ExitScope), method operations (EnterMethod, DefineAttribute, AliasMethod, SetMethodVisibility, SetDefaultVisibility), constants (DefineConstant, AliasConstant, SetConstantVisibility), variables (DefineInstanceVariable, DefineClassVariable, DefineGlobalVariable), mixins (Mixin), and references (ReferenceConstant, ReferenceMethod). Each variant is its own struct so consumers can take fields by value or by reference.

OperationPrinter (operation/printer.rs): Debug formatter that pretty-prints Vec<Operation> with indentation reflecting scope nesting. Used for debugging and testing the builder output.

RubyOperationBuilder (operation/ruby_builder.rs): Walks the Ruby AST and produces Vec<Operation>. Same visitor pattern as RubyIndexer, but emits operations instead of mutating a graph. ~3100 lines, 1:1 correspondence with every RubyIndexer behavior.

OperationApplier (operation/applier.rs): Converts Vec<Operation>LocalGraph. Consumes operations by value (zero clones). This is scaffolding that exists only because the current merge pipeline expects a LocalGraph. In the target architecture, resolution consumes operations directly and both OperationApplier and LocalGraph go away.

IndexerBackend enum (indexing.rs, main.rs): --indexer ruby_indexer|operation_builder CLI flag to switch backends. Both backends run the same shared test suite (166 test functions × 2 backends).

Parity verification

Both the indexer test suite (166 tests) and the resolution test suite (208 tests) run with both backends using a #[path] module trick — each parent module provides a backend() function via super::backend(), so the same test file exercises both RubyIndexer and OperationBuilder.

Both backends produce identical output on Shopify core:

RubyIndexer OperationBuilder
Total definitions 926,926 926,926
Total declarations 749,360 749,360

Performance

RubyIndexer OperationBuilder Delta
Indexing time 3.98s 4.02s +1.0%
Total time 29.3s 29.9s +1.9%
Memory (RSS) 3,063 MB 3,143 MB +2.6%

Essentially identical performance. The intermediate Vec<Operation> per file adds a very small overhead.

@Morriar Morriar requested a review from a team as a code owner May 8, 2026 16:52
Morriar added 9 commits May 8, 2026 12:55
Move definition ID computation (namespace_definition_id, method_definition_id)
from inline format strings in each definition type into model::ids so they can
be reused by the operation applier.
Introduce the Operation enum with 18 struct variants (EnterClass, EnterModule,
EnterMethod, DefineConstant, etc.) plus ExitScope. Each struct carries its own
data; scope context is provided by surrounding Enter/Exit operations.
Implement OperationVisitor to produce a human-readable, indented text
representation of operations. Used by builder tests to assert on output.
Walk the Ruby AST (via ruby_prism) to produce a Vec<Operation>. This is the
first phase of the two-phase indexing pipeline: source code to operations.
Implement OperationVisitor to convert operations into definitions in a LocalGraph.
This is the second phase of the pipeline: operations to indexed definitions.

LocalGraph::from_parts allows constructing a graph from pre-built parts (strings,
names, document) so the applier can build graphs without going through the indexer.
Introduce IndexerBackend to switch between RubyIndexer and the operation
pipeline. Add build_local_graph to dispatch indexing based on the backend.
Both RubyIndexer and OperationApplier now run the same tests via a shared
ruby_indexer_tests.rs included with #[path]. Each parent provides a backend()
function to select which pipeline to use.
Wire up the IndexerBackend enum to the CLI so users can choose between
the original RubyIndexer and the operation pipeline.
@Morriar Morriar force-pushed the at-index-operations branch from e4e719f to fd915fa Compare May 8, 2026 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant