Skip to content

fix(engine): actionable error when a Jinja field is missing/None/empty#633

Open
nabinchha wants to merge 3 commits into
mainfrom
fix/template-render-errors-629
Open

fix(engine): actionable error when a Jinja field is missing/None/empty#633
nabinchha wants to merge 3 commits into
mainfrom
fix/template-render-errors-629

Conversation

@nabinchha
Copy link
Copy Markdown
Contributor

Summary

Closes #629.

Empty-render and missing-attribute Jinja failures used to surface as the same unhelpful message:

🛑 Failed to process column 'preferred_english_name': User provided prompt generation template is invalid.

Two distinct failure modes were collapsing into that generic string:

  • Bug 1 — empty render (e.g. {{ person.preferred_english_name }} where the key is missing): _assert_rendered_text_not_empty raised a descriptive UserTemplateError, but sanitize_user_exceptions then replaced the message with the generic one.
  • Bug 3 — raw UndefinedError (e.g. {{ person.address.street }} where address is missing): a jinja2.exceptions.UndefinedError leaked all the way up because the sanitizer only caught UserTemplateError / TemplateSyntaxError.

Both paths now raise a new EmptyTemplateRenderError(UserTemplateError) subclass that the sanitizer is explicitly told to pass through. The exception carries an actionable, row-level message naming the offending chain plus copy-pasteable Jinja conditional and SkipConfig remediation patterns:

Template rendered to empty text. This usually happens when one or more
referenced fields are missing, None, or empty in this row.

Likely culprits in this row:
  - person.address.street (missing from record)

To handle missing values, you can:

  1. Provide a fallback in your template using a Jinja conditional:
       {{ person.address.street if person.address else 'N/A' }}

  2. Skip rows where required fields are missing using SkipConfig:
       skip=SkipConfig(when="{{ not person.address }}")

The "gate" expression in the suggestions stops one accessor past the resolvable prefix so it stays safe to evaluate (avoiding a fresh UndefinedError on the suggested fix itself).

What changed

packages/data-designer-engine/src/data_designer/engine/processing/ginja/:

  • exceptions.py — new EmptyTemplateRenderError subclass.
  • ast.py — new public helpers ast_extract_access_chains and resolve_access_chain (plus an AccessChain type alias).
  • environment.py:
    • _assert_rendered_text_not_empty / validate_rendered_text thread the template + record through and raise the new error with the per-row diagnostic.
    • UserTemplateSandboxEnvironment.safe_render catches UndefinedError and converts it to the same EmptyTemplateRenderError. NativeJinjaSandboxEnvironment.render_template mirrors the conversion for consistency.
    • sanitize_user_exceptions bypasses EmptyTemplateRenderError (same treatment as UserTemplateUnsupportedFiltersError).

End-to-end propagation already prepends the column name via _run_batch, so no other layers needed changes.

Test plan

  • New regression tests for both bugs (test_empty_render_raises_empty_template_render_error_with_culprit_chain, test_undefined_nested_attr_raises_empty_template_render_error_with_safe_gate).
  • Coverage for the "resolved-to-None" / "resolved-to-empty-string" / multi-culprit branches.
  • Test that the sanitizer no longer collapses the new error to the generic message.
  • Test that NativeJinjaSandboxEnvironment also converts UndefinedError.
  • Parametrized coverage for ast_extract_access_chains (plain text, single name, nested attr, subscript, mixed, dynamic subscript skipped, loop/conditional context) and resolve_access_chain (resolved, partially-resolved, type-mismatch cases).
  • Happy-path render still works; one image-cell-generator test was updated from match="invalid" to match="empty" to reflect the new (more informative) message.
  • uv run pytest packages/data-designer-engine/tests/engine/processing/ginja/ — 95 passed.
  • uv run ruff format + uv run ruff check — clean.

Empty-render and missing-attribute failures used to surface as the
generic "User provided prompt generation template is invalid." either
because `sanitize_user_exceptions` stripped the detail or because
Jinja's raw `UndefinedError` leaked through. Both now raise a new
`EmptyTemplateRenderError` carrying a row-level diagnostic that names
the offending chain and includes copy-pasteable Jinja conditional and
SkipConfig fix patterns.

Closes #629.
@nabinchha nabinchha requested a review from a team as a code owner May 11, 2026 17:30
@github-actions
Copy link
Copy Markdown
Contributor

PR #633 Review — fix(engine): actionable error when a Jinja field is missing/None/empty

Summary

Closes #629. Two distinct failure modes in Jinja rendering used to collapse into the same unhelpful "User provided prompt generation template is invalid." message:

  1. Empty render — template like {{ person.preferred_english_name }} where the key is missing on the record.
  2. Raw UndefinedError — template like {{ person.address.street }} where person.address is missing, leaking Jinja's internal "'dict object' has no attribute 'address'".

The PR introduces EmptyTemplateRenderError(UserTemplateError) and teaches sanitize_user_exceptions to pass it through unaltered. Both the secure (UserTemplateSandboxEnvironment.safe_render) and native (NativeJinjaSandboxEnvironment.render_template) engines now convert UndefinedError + empty-render cases to this new error, which carries a row-level diagnostic naming culprit chains and giving copy-pasteable Jinja-conditional and SkipConfig remediation patterns. Two new AST helpers (ast_extract_access_chains, resolve_access_chain) back the culprit detection.

Findings

Correctness

  • Exception chaining inconsistency (minor). NativeJinjaSandboxEnvironment.render_template uses raise EmptyTemplateRenderError(...) from exception (environment.py:248-251), preserving the original UndefinedError in __cause__. UserTemplateSandboxEnvironment.safe_render re-raises without from exception (environment.py:402). The messages are curated so there's no user-visible difference, but __cause__ is useful when someone is debugging from a traceback. Recommend raise EmptyTemplateRenderError(...) from exception in safe_render too.
  • _build_empty_render_message fallback path is robust. Wrapping parser(user_template) + ast_extract_access_chains(...) in try/except Exception and falling back to a generic message avoids masking the original render failure. Good defensive posture.
  • resolve_access_chain list bounds. Correctly rejects out-of-bounds indices (acc >= len(current) or acc < -len(current)). Negative literal subscripts (e.g. {{ items[-1] }}) would reach resolve_access_chain as int accessors through _build_access_chain, and Python's native negative indexing on current[acc] handles them. Good.
  • Dict/list/scalar branching is complete. isinstance(current, dict), isinstance(current, list), fallthrough else — the three cases cover the sanitized record value types.
  • Gate expression derivation is safe. The one-past-resolvable-prefix slice (sample_accessors[: len(sample_prefix) + 1]) yields a Jinja Undefined leaf on evaluation, which stringifies to "" and is falsy — doesn't re-trigger UndefinedError. This is subtle; the inline comment at environment.py explains the reasoning clearly.
  • culprits[0] for the remediation suggestion. Only the first culprit becomes the template for the remediation example. With multiple culprits listed in the bullets, a user might fix only the first and rerun. Arguably OK (they'll iterate), but consider noting "(showing fix for first culprit)" if user feedback reports confusion.
  • Sanitizer order. sanitize_user_exceptions now catches (UserTemplateUnsupportedFiltersError, EmptyTemplateRenderError) before (UserTemplateError, TemplateSyntaxError). Since Python matches except clauses in order and both are UserTemplateError subclasses, the ordering is load-bearing. Correct as written. A short comment noting "ordering matters — these are UserTemplateError subclasses" would help future maintainers.
  • Dynamic subscripts. _build_access_chain returns None for a[b]-style chains, and the visitor descends into the subscript expression with in_chain=False, extracting b as its own chain. Matches the docstring behavior and the parametrized test coverage.

Project conventions

  • from __future__ import annotations present in all new/modified files ✓
  • Absolute imports only ✓
  • Modern type syntax (list[str], str | None, AccessChain = tuple[str, list[str | int]]) ✓
  • Type annotations on all new public callables ✓
  • EmptyTemplateRenderError subclasses UserTemplateError, preserving the canonical-error-at-boundaries invariant from AGENTS.md ✓
  • New AST helpers land in ginja/ast.py alongside existing helpers; _build_access_chain is private — good separation.
  • Module-level _CULPRIT_* sentinels are fine, but a small Enum or Literal type alias would make the classifier signature self-documenting. Not worth blocking on.

Tests

  • 95 tests in ginja/ passing per the PR description; diff shows thorough new coverage:
    • test_ast.py: parametrized chain-extraction (plain, nested, subscript, mixed, dynamic-subscript skip, loop/conditional context), parametrized resolve_access_chain (fully resolved, partially resolved, type-mismatch, list bounds).
    • test_environment.py: both bug regressions, culprit-classification branches (missing / None / empty string), multi-culprit listing, sanitizer-bypass assertion, native-engine parity, happy path, skip_template_validation=True path.
  • The image-cell-generator test was updated from match="invalid" to match="empty" with an explanatory comment pointing at issue Cryptic "User provided prompt generation template is invalid" error when a referenced template field is empty or missing #629. Good — this is exactly the kind of test churn worth annotating.
  • Gap (small): No test asserts the fallback message when _build_empty_render_message's parse branch fails (e.g., a monkey-patched parser that throws). The code path exists (environment.py:346-352) but is not covered. Arguably not worth chasing given how defensive it is.
  • Gap (small): No test for a chain with a non-identifier string key rendered via [...] bracket notation (e.g. {{ data["weird key"] }}). _format_access_chain has the branch; a parametrized case would close the loop.

Performance

  • Culprit detection runs only on the error path. It reparses user_template once per failure. Negligible.
  • ast_extract_access_chains is O(n) in AST size; visit does no backtracking.

Security

  • No new attack surface. The helpers operate on already-sanitized records and the existing trusted parse path.
  • The curated error message embeds user data (record keys in the access chain) into the exception string. This is the same blast radius as prior UserTemplateError messages — acceptable.

Documentation

  • Docstrings on new helpers are thorough and accurate. safe_render's docstring is updated to mention EmptyTemplateRenderError. ✓
  • Inline comments explain the non-obvious bits (top-level chain extraction, safe gate derivation, why the sanitizer bypasses this error class).

Suggested follow-ups (non-blocking)

  1. Add from exception to the UndefinedError → EmptyTemplateRenderError re-raise in UserTemplateSandboxEnvironment.safe_render for traceback consistency with the native engine path.
  2. Consider a one-line comment on the sanitize_user_exceptions except-clause noting that ordering is load-bearing.
  3. Add a test case for non-identifier string subscripts in _format_access_chain.
  4. If user reports come in about the multi-culprit case being ambiguous, consider labelling the remediation example ("fix for first culprit shown").

Verdict

Approve with minor nits. The change is well-scoped, well-tested, and genuinely fixes the reported bug class (both the "empty render" and "undefined nested attr" variants) without expanding blast radius. Exception taxonomy and sanitizer bypass are implemented correctly. The nits above are small-cost polish; none block merge.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 11, 2026

Greptile Summary

This PR fixes two distinct failure modes in Jinja template error reporting — both previously collapsed into a generic "template is invalid" message — by introducing a new EmptyTemplateRenderError subclass that carries a row-level diagnostic identifying the offending field chain and suggesting copy-pasteable Jinja conditional / SkipConfig remediation.

  • exceptions.py: Adds EmptyTemplateRenderError(UserTemplateError) so the sanitizer decorator can pass it through unchanged instead of replacing its message.
  • ast.py: New ast_extract_access_chains and resolve_access_chain helpers walk the template AST and record dict to identify which field chain caused the empty render, with proper handling for loop-scoped variables, dynamic subscripts, and the root-absent case.
  • environment.py: Both UserTemplateSandboxEnvironment and NativeJinjaSandboxEnvironment now convert raw UndefinedError to EmptyTemplateRenderError; _assert_rendered_text_not_empty / validate_rendered_text accept optional template + record to build the same diagnostic on empty-string renders; sanitize_user_exceptions is updated to pass the new subclass through.

Confidence Score: 5/5

The change is safe to merge — it replaces a generic error message with a diagnostically richer one and does not touch any data-mutation or auth paths.

Both failure modes are correctly intercepted, converted to the new subclass, and passed through the sanitizer unchanged. The gate-expression fix for the root-absent case is verified by two dedicated regression tests. The culprit-finding path wraps all its work in a broad try/except so failures there fall back gracefully to the generic message. Test coverage spans the full matrix of culprit types, scoping edge cases, and both sandbox environments.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/processing/ginja/exceptions.py Adds EmptyTemplateRenderError as a UserTemplateError subclass; minimal change, correctly structured for the sanitizer's exception hierarchy ordering.
packages/data-designer-engine/src/data_designer/engine/processing/ginja/ast.py New ast_extract_access_chains and resolve_access_chain helpers; logic for top-level chain extraction, dynamic-subscript skipping, and partial-resolution tracking is correct and well-tested.
packages/data-designer-engine/src/data_designer/engine/processing/ginja/environment.py UndefinedError handling and empty-render diagnostic wired into both sandbox environments; gate_accessors logic correctly handles the root-absent case; sanitizer ordering comment added for future maintainers.
packages/data-designer-engine/tests/engine/processing/ginja/test_environment.py Comprehensive regression tests for both bugs, all culprit-classification branches, sanitizer bypass, loop-variable scoping, empty-collection reporting, and safe gate generation including the root-absent regression case.
packages/data-designer-engine/tests/engine/processing/ginja/test_ast.py Parametrized tests for ast_extract_access_chains and resolve_access_chain cover plain text, nested attrs, subscripts, dynamic subscripts, loop/conditional contexts, and failure/type-mismatch cases.
packages/data-designer-engine/tests/engine/column_generators/generators/test_image.py Updated match string from 'invalid' to 'empty' to reflect the improved error message; correctly retains UserTemplateError as the expected type (EmptyTemplateRenderError is a subclass).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["safe_render / render_template called"] --> B["template.render(record)"]
    B --> C{Exception?}
    C -- "UndefinedError" --> D["_build_empty_render_message(template, record, parse)"]
    C -- "SecurityError / OverflowError" --> E["UserTemplateError (generic)"]
    C -- No exception --> F["validate_rendered_text(rendered_text, template, record)"]
    F --> G{rendered empty?}
    G -- Yes --> D
    G -- No --> H["return rendered_text ✓"]
    D --> I["ast_extract_access_chains(ast)"]
    I --> J["filter by meta.find_undeclared_variables"]
    J --> K["_collect_culprit_chains(chains, record)"]
    K --> L["resolve_access_chain per chain"]
    L --> M{Culprits found?}
    M -- Yes --> N["Build actionable message\nwith gate expression"]
    M -- No --> O["Generic fallback message"]
    N --> P["raise EmptyTemplateRenderError"]
    O --> P
    P --> Q["sanitize_user_exceptions wrapper"]
    Q --> R{Exception type?}
    R -- "EmptyTemplateRenderError\nor UnsupportedFiltersError" --> S["re-raise as-is ✓"]
    R -- "UserTemplateError / TemplateSyntaxError" --> T["wrap in generic message"]
Loading

Reviews (3): Last reviewed commit: "Merge branch 'main' into fix/template-re..." | Re-trigger Greptile

Comment on lines +736 to +737
gate_accessors = sample_accessors[: len(sample_prefix) + 1]
gate_chain = _format_access_chain(sample_name, gate_accessors)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Gate expression is one step too deep when the root variable is absent from the record

resolve_access_chain returns (False, None, []) both when the root name is missing from record (name not in record) and when the first accessor fails — both yield prefix = []. The gate computation sample_accessors[:len(sample_prefix)+1] therefore produces sample_accessors[:1] in either scenario. For {{ person.address.street }} with record = {}, that gives gate_chain = "person.address", so the suggestion becomes {{ person.address.street if person.address else 'N/A' }}. But in Jinja2, person is Undefined when it is absent from the context, and Undefined.__getattr__ raises UndefinedError immediately — the suggested template itself fails with the same kind of error the user is trying to fix.

The gate should fall back to the root name alone (gate_accessors = []) when sample_name is not present in record.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer-engine/src/data_designer/engine/processing/ginja/environment.py
Line: 736-737

Comment:
**Gate expression is one step too deep when the root variable is absent from the record**

`resolve_access_chain` returns `(False, None, [])` both when the root name is missing from `record` (`name not in record`) and when the first accessor fails — both yield `prefix = []`. The gate computation `sample_accessors[:len(sample_prefix)+1]` therefore produces `sample_accessors[:1]` in either scenario. For `{{ person.address.street }}` with `record = {}`, that gives `gate_chain = "person.address"`, so the suggestion becomes `{{ person.address.street if person.address else 'N/A' }}`. But in Jinja2, `person` is `Undefined` when it is absent from the context, and `Undefined.__getattr__` raises `UndefinedError` immediately — the suggested template itself fails with the same kind of error the user is trying to fix.

The gate should fall back to the root name alone (`gate_accessors = []`) when `sample_name` is not present in `record`.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6d258f1. When sample_name is not in record, the gate now falls back to the root name alone (gate_accessors = []) instead of slicing one accessor past an empty prefix. The resulting suggestion ({{ person.address.street if person else 'N/A' }} / SkipConfig(when="{{ not person }}")) is safe to evaluate because Jinja's default Undefined is falsy and stringifies to "", whereas Undefined.__getattr__ raises.

Added two regression tests covering this exact scenario:

  • test_undefined_root_variable_produces_safe_remediation_template — asserts the suggested template uses the root-name-only gate (and does not include the broken .address accessor).
  • test_undefined_root_variable_remediation_template_is_renderable — actually renders the suggested template against {} and verifies it returns 'N/A' instead of raising, which would have caught the original bug.

# contain its own variable references.
visit(node.arg, in_chain=False)
return
if isinstance(node, j_nodes.Name):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this walker needs to account for Jinja-scoped names. For {% for person in people %}{{ person.name }}{% endfor %} with an empty people list, the new diagnostic says person / person.name are missing from the record and suggests SkipConfig(when="{{ not person }}"), but person is only a loop variable. Maybe filter chains to roots from meta.find_undeclared_variables(ast), or track For target bindings before collecting culprit chains.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 6d258f1. Rather than hand-rolling scope tracking inside ast_extract_access_chains, I kept the walker pure and filtered the chains in _build_empty_render_message via meta.find_undeclared_variables, which is Jinja's canonical mechanism for distinguishing external references from loop targets (and any future scoping constructs):

chains = ast_extract_access_chains(ast)
undeclared = meta.find_undeclared_variables(ast)
chains = [(name, accessors) for name, accessors in chains if name in undeclared]

For {% for person in people %}{{ person.name }}{% endfor %} with people=[], this leaves only ("people", []) as a candidate culprit. Combined with the empty-collection classifier (see your follow-up comment), the diagnostic now correctly points at people. Locked in by test_loop_variable_is_not_reported_as_missing_culprit.

resolved, value, prefix = resolve_access_chain(record, name, accessors)
if not resolved:
return (_CULPRIT_MISSING, prefix)
if value is None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude caught one follow-up on the culprit classification: after loop-local names are filtered out, an empty loop like {% for item in items %}x{% endfor %} with items=[] still falls back to the no-culprit message because _classify_chain does not treat empty collections as empty values. Could add an empty collection reason here so the message points at items instead of losing the row-level detail.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6d258f1. Added a new _CULPRIT_EMPTY_COLLECTION = "resolved to empty collection" classification in _classify_chain that matches any empty list/dict/tuple/set/frozenset:

if isinstance(value, (list, dict, tuple, set, frozenset)) and len(value) == 0:
    return (_CULPRIT_EMPTY_COLLECTION, prefix)

For {% for item in items %}x{% endfor %} with items=[], the diagnostic now reads:

Likely culprits in this row:
  - items (resolved to empty collection)

Combined with the scope filtering from your other comment, the suggested fix correctly gates on items. Locked in by test_empty_collection_iterable_reported_as_culprit and test_empty_dict_iterable_reported_as_culprit.

Addresses the open review comments on #633:

1. (Greptile P1) Gate expression in the suggested remediation template
   was one accessor too deep when the root variable was entirely absent
   from the record, causing the suggested fix to itself raise
   UndefinedError. Fall back to gating on the root name alone when
   sample_name is not in record.

2. (andreatgretel) The AST walker reported loop-local names as missing
   culprits (e.g. ``person`` in ``{% for person in people %}...{% endfor %}``).
   Filter extracted chains through ``meta.find_undeclared_variables`` to
   defer to Jinja's canonical scope tracking.

3. (andreatgretel follow-up) Empty collections used as loop iterables
   (``items=[]``) fell through to the no-culprit fallback. Add a new
   ``_CULPRIT_EMPTY_COLLECTION`` classification so they're surfaced.

4. Minor: add ``from exception`` to ``safe_render``'s UndefinedError
   re-raise for traceback consistency with the native engine path, and
   add a note on the load-bearing exception ordering in
   ``sanitize_user_exceptions``.
@nabinchha nabinchha requested a review from andreatgretel May 18, 2026 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cryptic "User provided prompt generation template is invalid" error when a referenced template field is empty or missing

2 participants