Skip to content

perf(gfql): explore expression pushdown vs intermediate materialization #887

@lmeyerov

Description

@lmeyerov

Context

The GFQL executor codebase currently uses intermediate materializations throughout:

# Example from forward WHERE pruning
left_values = series_values(left_frame[left_col])
right_values = series_values(right_frame[right_col])
common = left_values & right_values
new_left = left_frame[left_frame[left_col].isin(common)]

This pattern appears throughout the hop, pruning, and filtering logic. It creates multiple intermediate objects and, for cuDF, triggers multiple kernel launches with potential CPU roundtrips.

Research Questions

  1. Expression pushdown: Can we use .query() or expression APIs to push filtering to the engine?

    • pandas: .query(f"{col} in @other_df['{col}']")
    • cuDF: Similar but may have syntax differences
  2. Merge-based filtering: Would merge-based joins be more efficient?

    filtered = left_frame.merge(
        right_frame[[right_col]].drop_duplicates().rename(columns={right_col: left_col}),
        on=left_col,
        how='inner'
    )
  3. cuDF-specific optimizations:

    • Fused kernels for GPU
    • Avoiding CPU roundtrips (e.g., series_values() converts to Python set)
    • Leveraging cuDF's native join/filter primitives

Scope

This applies broadly across the GFQL codebase:

  • graphistry/compute/gfql/df_executor.py: Forward/backward pruning, materialization
  • graphistry/compute/gfql/same_path/where_filter.py: WHERE clause filtering
  • graphistry/compute/gfql/same_path/post_prune.py: Post-prune constraint propagation
  • graphistry/compute/gfql/same_path/multihop.py: Multi-hop edge filtering
  • graphistry/compute/gfql/same_path/bfs.py: BFS reachability
  • graphistry/compute/hop.py: Core hop operations

Tradeoffs to Evaluate

Factor Intermediate Materialization Query/Expression Merge-based
Memory Creates intermediate sets May avoid copies Single join
GPU perf Multiple kernels Single fused kernel? Native join
CPU perf Fine for small data Parsing overhead Join overhead
pandas/cuDF parity Works identically Syntax may differ Identical API
Debuggability Explicit steps String-based Implicit

Acceptance Criteria

  • Audit codebase for intermediate materialization patterns
  • Benchmark current approach vs alternatives on representative graphs
  • Measure GPU memory and kernel launch overhead for cuDF path
  • Document recommended approach with rationale
  • Implement if significant improvement found (>20% speedup or memory reduction)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions