Summary
After build_or_update_graph_tool (or the CLI incremental build) re-parses files containing a new entry point (e.g. a brand-new main() in a newly-added source file), the flow-detection step silently finds nothing: flows_detected is always 0, even though detect_entry_points does see the new entry point. Calling run_postprocess_tool (which uses the non-incremental trace_flows/store_flows path) immediately afterward picks it up correctly, which is what pointed me at the divergence between the two code paths.
Environment
- Package:
code-review-graph v2.3.6 (pipx install), also confirmed present on main at the same line numbers
- OS: Linux
- Trigger: MCP tool
build_or_update_graph_tool with default postprocess="full", on a repo where a new file with a new top-level function (acting as a flow entry point) was just staged/committed
Root Cause
In code_review_graph/flows.py, incremental_trace_flows() filters re-detected entry points by comparing against the changed_files set:
# flows.py:463
changed_file_set = set(changed_files)
...
# flows.py:513-517
entry_points = detect_entry_points(store)
relevant_eps = [
ep for ep in entry_points
if ep.file_path in changed_file_set or ep.id in entry_point_ids
]
changed_files comes from get_changed_files() (incremental.py:508-525), which runs git diff --name-only — this always returns paths relative to the repo root (e.g. src/reflection_demo.cpp).
However, node file_path values stored in the graph (and thus ep.file_path here) are absolute paths (e.g. /home/user/repo/src/reflection_demo.cpp), as can be confirmed via any node query (query_graph_tool pattern file_summary returns absolute file_path).
Because of this format mismatch, ep.file_path in changed_file_set is always False for every entry point whose flow doesn't already exist (the or ep.id in entry_point_ids branch only covers entry points belonging to a flow that's being re-traced because it touched a changed file via flow_memberships — same absolute-vs-relative issue applies to the membership lookup at flows.py:469-474 too, via the n.file_path IN (...) SQL clause). So brand-new entry points in changed files are never traced by the incremental path, and flows_detected reports 0 even on a successful incremental update.
The standalone run_postprocess_tool (tools/build.py:502-509) is unaffected because it calls the full trace_flows(store) (no path filtering at all), which is why running it manually after an incremental build "fixes" the missing flow.
Repro
- Add a new
.cpp file with a new top-level function that qualifies as an entry point (e.g. a main()).
git add it so it's visible to git diff --name-only HEAD~1.
- Call
build_or_update_graph_tool(full_rebuild=false) (default postprocess="full").
- Observe
flows_detected: 0 in the result, and list_flows_tool does not include the new entry point.
- Call
run_postprocess_tool(flows=true) — the new flow now appears correctly.
Suggested Fix
Normalize both sides to the same representation before comparing — either resolve changed_files entries to absolute paths (relative to repo_root) before building changed_file_set, or store/compare relative paths consistently. The same fix needs to be applied to the SQL file_path IN (...) lookup at flows.py:469-474, which has the identical relative-vs-absolute mismatch.
As a workaround, calling run_postprocess_tool(flows=true) after every incremental update reliably re-syncs flows, at the cost of doing a full (non-incremental) flow trace each time.
Summary
After
build_or_update_graph_tool(or the CLI incremental build) re-parses files containing a new entry point (e.g. a brand-newmain()in a newly-added source file), the flow-detection step silently finds nothing:flows_detectedis always0, even thoughdetect_entry_pointsdoes see the new entry point. Callingrun_postprocess_tool(which uses the non-incrementaltrace_flows/store_flowspath) immediately afterward picks it up correctly, which is what pointed me at the divergence between the two code paths.Environment
code-review-graphv2.3.6 (pipx install), also confirmed present onmainat the same line numbersbuild_or_update_graph_toolwith defaultpostprocess="full", on a repo where a new file with a new top-level function (acting as a flow entry point) was just staged/committedRoot Cause
In
code_review_graph/flows.py,incremental_trace_flows()filters re-detected entry points by comparing against thechanged_filesset:changed_filescomes fromget_changed_files()(incremental.py:508-525), which runsgit diff --name-only— this always returns paths relative to the repo root (e.g.src/reflection_demo.cpp).However, node
file_pathvalues stored in the graph (and thusep.file_pathhere) are absolute paths (e.g./home/user/repo/src/reflection_demo.cpp), as can be confirmed via any node query (query_graph_toolpatternfile_summaryreturns absolutefile_path).Because of this format mismatch,
ep.file_path in changed_file_setis alwaysFalsefor every entry point whose flow doesn't already exist (theor ep.id in entry_point_idsbranch only covers entry points belonging to a flow that's being re-traced because it touched a changed file via flow_memberships — same absolute-vs-relative issue applies to the membership lookup atflows.py:469-474too, via then.file_path IN (...)SQL clause). So brand-new entry points in changed files are never traced by the incremental path, andflows_detectedreports0even on a successful incremental update.The standalone
run_postprocess_tool(tools/build.py:502-509) is unaffected because it calls the fulltrace_flows(store)(no path filtering at all), which is why running it manually after an incremental build "fixes" the missing flow.Repro
.cppfile with a new top-level function that qualifies as an entry point (e.g. amain()).git addit so it's visible togit diff --name-only HEAD~1.build_or_update_graph_tool(full_rebuild=false)(defaultpostprocess="full").flows_detected: 0in the result, andlist_flows_tooldoes not include the new entry point.run_postprocess_tool(flows=true)— the new flow now appears correctly.Suggested Fix
Normalize both sides to the same representation before comparing — either resolve
changed_filesentries to absolute paths (relative torepo_root) before buildingchanged_file_set, or store/compare relative paths consistently. The same fix needs to be applied to the SQLfile_path IN (...)lookup atflows.py:469-474, which has the identical relative-vs-absolute mismatch.As a workaround, calling
run_postprocess_tool(flows=true)after every incremental update reliably re-syncs flows, at the cost of doing a full (non-incremental) flow trace each time.