Skip to content

LSP: expression_types map can hold the wrong type when expression spans collide πŸ—ΊοΈΒ #137

@timfennis

Description

@timfennis

Background β€” what is expression_types?

The LSP keeps a side table called expression_types in ndc_lsp/src/features/inlay_hints.rs:

pub expression_types: HashMap<usize, StaticType>,

It maps byte offsets in the source file to the static type of an expression that ends at that offset. Dot-completion uses it: when the user types expr., the LSP looks up the byte just after expr and asks "what's the type sitting here?" so it can suggest methods that accept that type. See completion.rs:22:

s.expression_types.get(&dot_offset).cloned().or_else(...)

The map is built by HintCollector::on_expression, which is called by walk_expression in ndc_lsp/src/visitor.rs as it walks every node in the AST. For each node, it inserts (node.span.end() β†’ node.type).

The map's implicit invariant should be: at any byte offset, the value is the type of the largest enclosing expression that ends at that offset (because that's what the user means when they type . right after it).

The problem

Two patterns in the AST cause multiple expressions to end at the same byte:

1. Binary operator calls

For 1 < 2, the parser lowers < to a function call (parser.rs:196):

let new_span = left.span.merge(right.span);
// ...
Expression::Call { function: <op>, arguments: vec![left, right] }.to_location(new_span);

So the outer Call's span ends at the same byte as its right argument. Three expressions all end at byte 5 in 1 < 2:

  • Call(<, [1, 2]) β€” type Bool
  • the right argument 2 β€” type Int
  • (and the Statement wrapper, see below)

2. The Statement wrapper

Top-level expressions are wrapped in Expression::Statement, which copies the inner expression's span verbatim (expression.rs:235-241):

pub fn to_statement(self) -> Self {
    Self { id: NodeId::next(), span: self.span, expression: Expression::Statement(Box::new(self)) }
}

So Statement(Call(<, [1, 2])) has the same span.end() as the inner Call, with a different type: () (unit), the statement type.

What actually happens for 1 < 2;

walk_expression walks parent-before-children (pre-order) and inserts with HashMap::insert (last-writer-wins). Trace:

  1. on_expression(Statement) β€” inserts unit at byte 5.
  2. recurse into Call. on_expression(Call) β€” overwrites byte 5 with Bool.
  3. recurse into arguments.
  4. on_expression(<literal 2>) β€” overwrites byte 5 with Int.

Final map: {1: Int, 5: Int}. The expected invariant says byte 5 should hold Bool (the Call).

Why no user-facing bug today

To actually trigger dot-completion on a binary operator result, you have to write something like (1 < 2).method. The parens are mandatory because 1 < 2.method would lex as 1 < (2.0) (2. starts a float literal). With parens, the outer node is Grouping, whose span includes the parens β€” so Grouping.span.end() is one byte after the inner Call.span.end(). No collision, no wrong lookup.

So today's only consumer (dot-completion) never queries the corrupted byte in any source a user can actually write. The map data is technically wrong, but no feature reads from the wrong slot.

It will start mattering as soon as any LSP feature queries expression_types more broadly β€” hover-on-expression is the obvious next one (e.g. hovering on 2 in 1 < 2 would surface Int instead of any contextual type), and any future feature that scans the map.

Why the obvious fix doesn't quite work

The natural fix is to switch insert to entry().or_insert_with(...) so the first writer wins. Since the visitor is pre-order, the largest enclosing expression gets there first.

But because Statement shares its inner's span, Statement(unit) would now win the byte-5 slot over the Call(Bool). Result: unit instead of Bool β€” still wrong, just wrong in a different way.

A correct fix needs both:

  1. Switch to entry().or_insert_with(...) (first-writer-wins).
  2. Skip Statement in on_expression (and audit any other "wrapper" expressions that copy their inner's span). Statement is a parse-time wrapper with no meaningful type of its own β€” its type is the inner's type for any consumer that cares.

Both pieces are needed: without (2), Statement steals the slot; without (1), the right operand steals it after the Call.

Reproduction

let info = collect_hints("1 < 2;");
assert_eq!(info.expression_types.get(&5), Some(&StaticType::Bool)); // fails today

(See the test pattern in ndc_lsp/src/features/inlay_hints.rs:107-177.)

Background

Flagged by Codex in review of #136. That PR added recursion into Call arguments to fix a separate inlay-hint bug, which exposed the latent map-overwrite issue. Pre-PR the map happened to read Bool at byte 5 β€” but only because the visitor was missing arms entirely and processing Call after Statement overwrote it, not because of any principled design.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions