Skip to content

Improve execution performance in RequestManager validation and Execution.has_ended#60

Open
pedroigorjs wants to merge 5 commits into
mainfrom
f/improve-execution-performance
Open

Improve execution performance in RequestManager validation and Execution.has_ended#60
pedroigorjs wants to merge 5 commits into
mainfrom
f/improve-execution-performance

Conversation

@pedroigorjs

Copy link
Copy Markdown
Contributor

Summary

This PR improves rule execution performance by reducing overhead in two hot paths:

  • replaces pandera-based DataFrame validation in RequestManager.validate() with a lightweight pandas-based validation flow
  • optimizes Execution.has_ended() by switching from a full null-count scan to a short-circuit null check

What Changed

  • RequestManager
    • precomputes a lightweight schema as a plain dict instead of a pandera.DataFrameSchema
    • validates required columns manually
    • handles null/default substitution with vectorized pandas operations
    • coerces validated input columns to str
  • Execution
    • changes has_ended() from .isna().sum() == 0 to not .isna().any()
  • Tests
    • updates the RequestManager test to match the new lightweight schema representation

Motivation

The goal is to reduce validation overhead during rule execution, especially in smaller and latency-sensitive workloads where fixed framework costs dominate total runtime.

Local performance notes in the repository indicate that input validation was one of the main execution bottlenecks, and that has_ended() is called repeatedly during execution. These changes target both paths directly.

Testing

  • poetry run pytest tests/test_engine/test_request_manager.py

Risks

  • This PR changes validation semantics for DataFrame inputs:
    • "", "null" and "None" are now treated as null-like values during DataFrame validation
    • previously, under pandera, those values were accepted as valid strings
  • If existing rules or clients rely on those literal string values, this change may be behavior-breaking

Notes

This PR is intentionally small and focused on committed performance changes only:

  • lightweight RequestManager validation
  • has_ended() short-circuit optimization

@pedroigorjs pedroigorjs self-assigned this May 22, 2026
@pedroigorjs pedroigorjs added the enhancement New feature or request label May 22, 2026
@pedroigorjs pedroigorjs marked this pull request as ready for review May 22, 2026 21:33
@github-actions

Copy link
Copy Markdown

coverage

Coverage Report
FileStmtsMissCoverMissing
retrack/engine
   base.py70593%48, 62–64, 152
   executor.py1311092%62, 74–75, 153, 225, 271, 289, 293, 305, 313
   request_manager.py801186%16, 35, 54, 69–70, 149, 152, 171–176
   rule.py50394%87, 109, 116
retrack/nodes
   check.py41393%23, 26, 89
   constants.py93397%123, 126, 171
   math.py43198%75
retrack/nodes/dynamic
   base.py16194%26
   csv_table.py44295%60, 63
   flow.py44198%18
   flow_connector.py251252%24–48
retrack/utils
   component_registry.py1051883%77–78, 81–82, 85–86, 91–92, 101, 112–123
   exceptions.py28582%17–18, 106–127
   graph.py60788%35, 46, 59, 61, 63, 84, 86
   registry.py34682%23–26, 38, 43, 51
   transformers.py1561690%17, 37–38, 52, 57–58, 63, 66, 69, 72, 75, 109, 169, 178, 227, 353
retrack/validators
   base.py4175%14
   node_exists.py15287%36, 38
   node_validator.py33391%31, 56–57
TOTAL163611093% 

Tests Skipped Failures Errors Time
92 0 💤 0 ❌ 0 🔥 4.206s ⏱️

@github-actions

Copy link
Copy Markdown

coverage

Coverage Report
FileStmtsMissCoverMissing
retrack/engine
   base.py70593%48, 62–64, 152
   executor.py1311092%62, 74–75, 153, 225, 271, 289, 293, 305, 313
   request_manager.py801186%16, 35, 54, 69–70, 149, 152, 171–176
   rule.py50394%87, 109, 116
retrack/nodes
   check.py41393%23, 26, 89
   constants.py93397%123, 126, 171
   math.py43198%75
retrack/nodes/dynamic
   base.py16194%26
   csv_table.py44295%60, 63
   flow.py44198%18
   flow_connector.py251252%24–48
retrack/utils
   component_registry.py1051883%77–78, 81–82, 85–86, 91–92, 101, 112–123
   exceptions.py28389%106–127
   graph.py60788%35, 46, 59, 61, 63, 84, 86
   registry.py34682%23–26, 38, 43, 51
   transformers.py1561690%17, 37–38, 52, 57–58, 63, 66, 69, 72, 75, 109, 169, 178, 227, 353
retrack/validators
   base.py4175%14
   node_exists.py15287%36, 38
   node_validator.py33391%31, 56–57
TOTAL163610893% 

Tests Skipped Failures Errors Time
92 0 💤 0 ❌ 0 🔥 3.810s ⏱️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant