Skip to content

feat: implement semantic rule validation for DQ rules (#1169)#1203

Open
Vsatyam013 wants to merge 4 commits into
databrickslabs:mainfrom
Vsatyam013:feat/1169-semantic-validation
Open

feat: implement semantic rule validation for DQ rules (#1169)#1203
Vsatyam013 wants to merge 4 commits into
databrickslabs:mainfrom
Vsatyam013:feat/1169-semantic-validation

Conversation

@Vsatyam013

@Vsatyam013 Vsatyam013 commented May 27, 2026

Copy link
Copy Markdown
Contributor

Changes

Implements semantic validation for DQ rule expressions as part of issue #1169.
Adds a SemanticValidator class that checks SQL expressions for forbidden
destructive keywords (DROP, TRUNCATE, DELETE, UPDATE, INSERT, ALTER) before
rules are applied. Validation results are collected into ChecksValidationStatus
rather than raising exceptions, consistent with the existing validation contract.

Linked issues

Resolves #1169

Tests

  • added unit tests

Documentation and Demos

  • added/updated demos
  • added/updated docs
  • added/updated agent skills

@CLAassistant

CLAassistant commented May 27, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@github-actions

Copy link
Copy Markdown
Contributor

All commits in PR should be signed ('git commit -S ...'). See https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits

@Vsatyam013 Vsatyam013 force-pushed the feat/1169-semantic-validation branch from d9d5ef7 to 379a5ba Compare May 27, 2026 16:44
@Vsatyam013 Vsatyam013 marked this pull request as ready for review May 28, 2026 03:08
@Vsatyam013 Vsatyam013 requested a review from a team as a code owner May 28, 2026 03:08
@Vsatyam013 Vsatyam013 requested review from grusin-db and removed request for a team May 28, 2026 03:08
@Vsatyam013 Vsatyam013 force-pushed the feat/1169-semantic-validation branch from ff01ddc to f84132b Compare May 29, 2026 05:41

@ghanse ghanse left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Vsatyam013 , the linked issue focuses more on validating the entire ruleset to detect the following issues:

  • Detect if a duplicate rule already exists with the same run config
  • Detect if a similar rule exists with different arguments (e.g. a narrower threshold of is_in_range)

Can you make some changes to address these more specifically? I think it should be simpler because the check function inputs are mostly structured. We can document any limitations when users execute checks with Spark expressions.

Comment thread CHANGELOG.md

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We update CHANGELOG automatically so these changes will need to be reverted.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we want to do this level of validation in DQX. There are a lot of scenarios we would need to cover.

@ghanse ghanse added under-review This PR is currently being reviewed by one of DQX maintainers. needs-changes Changes required after review labels Jun 2, 2026
@mwojtyczka

mwojtyczka commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@Vsatyam013 do you plan to complete the PR, and address review comments? there was no update since 4 weeks.

@Vsatyam013 Vsatyam013 force-pushed the feat/1169-semantic-validation branch from ea0c092 to da2897a Compare June 25, 2026 03:24
@Vsatyam013

Vsatyam013 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Hi @mwojtyczka, sorry for the delay. I've reworked the implementation based on @ghanse's feedback. The SQL keyword validation approach has been removed and replaced with ruleset-level semantic checks, duplicate rule detection and conflicting rule detection (same function and column with different arguments). The validation is now integrated into validate_checks, save_checks, and load_checks with a configurable semantic_validation_mode parameter (warn, fail, or None). CHANGELOG changes have also been reverted.Can you review it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-changes Changes required after review under-review This PR is currently being reviewed by one of DQX maintainers.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Add methods for semantic rule validation

4 participants