Skip to content

Incremental flow re-detection never finds new entry points (relative vs absolute path mismatch) #569

@fzully

Description

@fzully

Summary

After build_or_update_graph_tool (or the CLI incremental build) re-parses files containing a new entry point (e.g. a brand-new main() in a newly-added source file), the flow-detection step silently finds nothing: flows_detected is always 0, even though detect_entry_points does see the new entry point. Calling run_postprocess_tool (which uses the non-incremental trace_flows/store_flows path) immediately afterward picks it up correctly, which is what pointed me at the divergence between the two code paths.

Environment

  • Package: code-review-graph v2.3.6 (pipx install), also confirmed present on main at the same line numbers
  • OS: Linux
  • Trigger: MCP tool build_or_update_graph_tool with default postprocess="full", on a repo where a new file with a new top-level function (acting as a flow entry point) was just staged/committed

Root Cause

In code_review_graph/flows.py, incremental_trace_flows() filters re-detected entry points by comparing against the changed_files set:

# flows.py:463
changed_file_set = set(changed_files)
...
# flows.py:513-517
entry_points = detect_entry_points(store)
relevant_eps = [
    ep for ep in entry_points
    if ep.file_path in changed_file_set or ep.id in entry_point_ids
]

changed_files comes from get_changed_files() (incremental.py:508-525), which runs git diff --name-only — this always returns paths relative to the repo root (e.g. src/reflection_demo.cpp).

However, node file_path values stored in the graph (and thus ep.file_path here) are absolute paths (e.g. /home/user/repo/src/reflection_demo.cpp), as can be confirmed via any node query (query_graph_tool pattern file_summary returns absolute file_path).

Because of this format mismatch, ep.file_path in changed_file_set is always False for every entry point whose flow doesn't already exist (the or ep.id in entry_point_ids branch only covers entry points belonging to a flow that's being re-traced because it touched a changed file via flow_memberships — same absolute-vs-relative issue applies to the membership lookup at flows.py:469-474 too, via the n.file_path IN (...) SQL clause). So brand-new entry points in changed files are never traced by the incremental path, and flows_detected reports 0 even on a successful incremental update.

The standalone run_postprocess_tool (tools/build.py:502-509) is unaffected because it calls the full trace_flows(store) (no path filtering at all), which is why running it manually after an incremental build "fixes" the missing flow.

Repro

  1. Add a new .cpp file with a new top-level function that qualifies as an entry point (e.g. a main()).
  2. git add it so it's visible to git diff --name-only HEAD~1.
  3. Call build_or_update_graph_tool(full_rebuild=false) (default postprocess="full").
  4. Observe flows_detected: 0 in the result, and list_flows_tool does not include the new entry point.
  5. Call run_postprocess_tool(flows=true) — the new flow now appears correctly.

Suggested Fix

Normalize both sides to the same representation before comparing — either resolve changed_files entries to absolute paths (relative to repo_root) before building changed_file_set, or store/compare relative paths consistently. The same fix needs to be applied to the SQL file_path IN (...) lookup at flows.py:469-474, which has the identical relative-vs-absolute mismatch.

As a workaround, calling run_postprocess_tool(flows=true) after every incremental update reliably re-syncs flows, at the cost of doing a full (non-incremental) flow trace each time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions