Skip to content

Conversation

@mpolson64
Copy link
Contributor

Summary:
NOTE: This is much slower than the implementation which is backed by a dataframe. For clarity, Ive put this naive implementation up as its own diff and the next diff hunts for speedups.

Creates new source of truth for Data: the DataRow. The df is now a cached property which is dynamically generated based on these rows.

In the future, these will become a Base object in SQLAlchemy st. Data will have a SQLAlchemy relationship to a list of DataRows which live in their own table.

RFC:

  1. Im renaming sem -> se here (but keeping sem in the df for now, since this could be an incredibly involved cleanup). Do we have alignment that this is a positive change? If so I can either start of backlog the cleanup across the codebase. cc Balandat who Ive talked about this with a while back.
  2. This removes the ability for Data to contain arbitrary columns, which was added in D83682740 and afaik unused. Arbitrary new columns would not be compatible with the new storage setup (it was easy in the old setup which is why we added it), and I think we should take a careful look at how to store contextual data in the future in a structured way.

Differential Revision: D90605846

@meta-cla meta-cla bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jan 15, 2026
@meta-codesync
Copy link

meta-codesync bot commented Jan 15, 2026

@mpolson64 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D90605846.

Summary:

TData was necesssary whern we had multiple different Data classes, but recent developments have made this no longer needed

Differential Revision: D90596942
Summary:

Moved these tests into TestData, since Data is the only data-related class in Ax.

Differential Revision: D90605845
Summary:

NOTE: This is much slower than the implementation which is backed by a dataframe. For clarity, Ive put this naive implementation up as its own diff and the next diff hunts for speedups.

Creates new source of truth for Data: the DataRow. The df is now a cached property which is dynamically generated based on these rows.

In the future, these will become a Base object in SQLAlchemy st. Data will have a SQLAlchemy relationship to a list of DataRows which live in their own table.

RFC:

1. Im renaming sem -> se here (but keeping sem in the df for now, since this could be an incredibly involved cleanup). Do we have alignment that this is a positive change? If so I can either start of backlog the cleanup across the codebase. cc Balandat who Ive talked about this with a while back.
2. This removes the ability for Data to contain arbitrary columns, which was added in D83682740 and afaik unused. Arbitrary new columns would not be compatible with the new storage setup (it was easy in the old setup which is why we added it), and I think we should take a careful look at how to store contextual data in the future in a structured way.

Differential Revision: D90605846
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 95.45455% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.73%. Comparing base (7bce65b) to head (ba5c887).

Files with missing lines Patch % Lines
ax/core/data.py 93.75% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4773   +/-   ##
=======================================
  Coverage   96.72%   96.73%           
=======================================
  Files         583      583           
  Lines       60860    60831   -29     
=======================================
- Hits        58866    58844   -22     
+ Misses       1994     1987    -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mpolson64 added a commit to mpolson64/Ax that referenced this pull request Jan 15, 2026
Summary:

NOTE: This is much slower than the implementation which is backed by a dataframe. For clarity, Ive put this naive implementation up as its own diff and the next diff hunts for speedups.

Creates new source of truth for Data: the DataRow. The df is now a cached property which is dynamically generated based on these rows.

In the future, these will become a Base object in SQLAlchemy st. Data will have a SQLAlchemy relationship to a list of DataRows which live in their own table.

RFC:

1. Im renaming sem -> se here (but keeping sem in the df for now, since this could be an incredibly involved cleanup). Do we have alignment that this is a positive change? If so I can either start of backlog the cleanup across the codebase. cc Balandat who Ive talked about this with a while back.
2. This removes the ability for Data to contain arbitrary columns, which was added in D83682740 and afaik unused. Arbitrary new columns would not be compatible with the new storage setup (it was easy in the old setup which is why we added it), and I think we should take a careful look at how to store contextual data in the future in a structured way.

Differential Revision: D90605846
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants