Skip to content

Data validation py312 1.17 e#276

Open
pritamdodeja wants to merge 3 commits intotensorflow:masterfrom
pritamdodeja:data-validation-py312-1.17-e
Open

Data validation py312 1.17 e#276
pritamdodeja wants to merge 3 commits intotensorflow:masterfrom
pritamdodeja:data-validation-py312-1.17-e

Conversation

@pritamdodeja
Copy link

As part of the porting of TFX to 3.12.4, this branch relaxes some of the version dependencies. Currently, it's more for discussion about the approach etc., as opposed to merging into master. For example, porting ZetaSQL to GoogleSQL is something we should do. It's not done in this particular branch, because this was more of a dry run to see what breaks.

vkarampudi and others added 3 commits June 9, 2025 13:38
…r Python 3.12

This commit updates the TFDV build environment for Python 3.12 compatibility by ripping out ZetaSQL-dependent code and updating Python package constraints.

Specific changes include:
* Strip ZetaSQL: Removed `zetasql` and `six` from `WORKSPACE`. Removed the `custom_validation` cc_library, its pybind11 hook (`CustomValidateStatistics`), and related test targets, as ZetaSQL compilation fails on modern toolchains.
* Modernize PyArrow & TF: Updated `setup.py` to allow `pyarrow>=14,<22` for Python >= 3.11 to avoid building legacy Arrow 10 source code. Relaxed the `tensorflow` constraint to `>=2.16,<2.18` and adjusted `tfx-bsl` / `tensorflow-metadata` base versions.
* Fix Test Dependencies: Added `scikit-learn==1.5.1` and `scipy==1.17.0` to `install_requires` so the mutual information generators and tests can execute properly.
* Cleanup Build Macros: Removed legacy Python 2 pybind11 initialization symbols (`init%s`) from `build_macros.bzl`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants