Skip to content

Support Python 3.14 (via bumping or expanding pyarrow dep) #673

@mikeknep

Description

@mikeknep

Priority Level

Low

Task Summary

The core objective of this task is to support running on Python 3.14. The various pyproject.toml files only set a minimum version of >=3.10, but in practice 3.14 is not supported because of the pyarrow pin, specifically the <20 cap.

PyArrow 22 added support for Python 3.14

Technical Details & Implementation Plan

Bump the pyarrow dep to be inclusive of version 22. More details in "Agent Findings" below.

Investigation / Context

See Agent Findings

Agent Plan / Findings

Bottom Line

Supporting Python 3.14 looks feasible. The pyarrow jump is the main packaging risk, not an obvious code compatibility risk.
I tested a temporary Python 3.14 environment with pyarrow==22.0.0:

  • Runtime deps resolved with pyarrow>=22,<23.
  • All unit tests passed: 3498 passed, 1 skipped.
  • E2E tests passed: 6 passed, 2 skipped.
  • Existing pyarrow/parquet-focused tests passed: 116 passed.

PyArrow 22 Risk

Risk is moderate for packaging/distribution, low-to-moderate for repo behavior.

Main risks:

  • pyarrow 22 provides Python 3.14 wheels, but its Linux wheels are manylinux_2_28. That can break users on older glibc platforms that currently get pyarrow 19 manylinux2014/manylinux_2_17 wheels.
  • pyarrow 20-22 include behavior/API changes around Parquet logical types, extension types, schema handling, and removed deprecated APIs.
  • DataDesigner mostly uses stable APIs: pd.read_parquet, DataFrame.to_parquet, pyarrow.parquet.read_metadata, read_schema, read_table, ParquetWriter, pa.unify_schemas, Table.cast, Table.from_pandas, and Arrow dtype inspection.
  • The most likely behavioral differences would be schema/type strings in metadata/profiling, nested/list/struct handling, and Parquet logical/extension type round-trips.

I would avoid simply widening to pyarrow>=19.0.1,<23 for everyone, because resolvers will generally choose 22 for Python 3.10-3.13 too, which increases blast radius.

Preferred lower-risk dependency shape:

"pyarrow>=19.0.1,<20; python_version < '3.14'",
"pyarrow>=22,<23; python_version >= '3.14'",

That keeps existing users on the currently tested pyarrow line and only uses 22 where 3.14 requires it.

Required Repo Changes

  • Update packages/data-designer-config/pyproject.toml pyarrow dependency, preferably with Python-version markers.
  • Regenerate uv.lock.
  • Regenerate tests_e2e/uv.lock.
  • Add Python 3.14 classifiers in:
    • packages/data-designer-config/pyproject.toml
    • packages/data-designer-engine/pyproject.toml
    • packages/data-designer/pyproject.toml
  • Add 3.14 to .github/workflows/ci.yml matrices for package tests and e2e tests.
  • Add 3.14 to the test-summary matrix so branch-protection status names exist.
  • Update README badges in:
    • README.md
    • packages/data-designer/README.md
  • Update stale docs/tooling notes that say pyarrow lacks 3.14 wheels:
    • Makefile docs Python comment/default around DOCS_PYTHON_VERSION
    • .agents/skills/datadesigner-docs/SKILL.md
  • Optionally update docs workflows currently pinned to 3.13 if you want docs builds to exercise 3.14 too.

Not Required

  • requires-python = ">=3.10" already includes 3.14.
  • ruff.target-version = "py310" should stay as-is because the project still supports 3.10 syntax.
  • No code changes were exposed by the Python 3.14 test run, though Python 3.14 does emit a deprecation warning for asyncio.iscoroutinefunction; that is a future Python 3.16 cleanup, not a 3.14 blocker.

Dependencies

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    taskInternal development task

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions