perf(gfql): explore expression pushdown vs intermediate materialization

## Context

The GFQL executor codebase currently uses intermediate materializations throughout:

```python
# Example from forward WHERE pruning
left_values = series_values(left_frame[left_col])
right_values = series_values(right_frame[right_col])
common = left_values & right_values
new_left = left_frame[left_frame[left_col].isin(common)]
```

This pattern appears throughout the hop, pruning, and filtering logic. It creates multiple intermediate objects and, for cuDF, triggers multiple kernel launches with potential CPU roundtrips.

## Research Questions

1. **Expression pushdown**: Can we use `.query()` or expression APIs to push filtering to the engine?
   - pandas: `.query(f"{col} in @other_df['{col}']")`
   - cuDF: Similar but may have syntax differences

2. **Merge-based filtering**: Would merge-based joins be more efficient?
   ```python
   filtered = left_frame.merge(
       right_frame[[right_col]].drop_duplicates().rename(columns={right_col: left_col}),
       on=left_col,
       how='inner'
   )
   ```

3. **cuDF-specific optimizations**: 
   - Fused kernels for GPU
   - Avoiding CPU roundtrips (e.g., `series_values()` converts to Python set)
   - Leveraging cuDF's native join/filter primitives

## Scope

This applies broadly across the GFQL codebase:

- `graphistry/compute/gfql/df_executor.py`: Forward/backward pruning, materialization
- `graphistry/compute/gfql/same_path/where_filter.py`: WHERE clause filtering
- `graphistry/compute/gfql/same_path/post_prune.py`: Post-prune constraint propagation
- `graphistry/compute/gfql/same_path/multihop.py`: Multi-hop edge filtering
- `graphistry/compute/gfql/same_path/bfs.py`: BFS reachability
- `graphistry/compute/hop.py`: Core hop operations

## Tradeoffs to Evaluate

| Factor | Intermediate Materialization | Query/Expression | Merge-based |
|--------|------------------------------|------------------|-------------|
| Memory | Creates intermediate sets | May avoid copies | Single join |
| GPU perf | Multiple kernels | Single fused kernel? | Native join |
| CPU perf | Fine for small data | Parsing overhead | Join overhead |
| pandas/cuDF parity | Works identically | Syntax may differ | Identical API |
| Debuggability | Explicit steps | String-based | Implicit |

## Acceptance Criteria

- [ ] Audit codebase for intermediate materialization patterns
- [ ] Benchmark current approach vs alternatives on representative graphs
- [ ] Measure GPU memory and kernel launch overhead for cuDF path
- [ ] Document recommended approach with rationale
- [ ] Implement if significant improvement found (>20% speedup or memory reduction)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(gfql): explore expression pushdown vs intermediate materialization #887

Context

Research Questions

Scope

Tradeoffs to Evaluate

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Factor	Intermediate Materialization	Query/Expression	Merge-based
Memory	Creates intermediate sets	May avoid copies	Single join
GPU perf	Multiple kernels	Single fused kernel?	Native join
CPU perf	Fine for small data	Parsing overhead	Join overhead
pandas/cuDF parity	Works identically	Syntax may differ	Identical API
Debuggability	Explicit steps	String-based	Implicit

perf(gfql): explore expression pushdown vs intermediate materialization #887

Description

Context

Research Questions

Scope

Tradeoffs to Evaluate

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions