Use new DType API with backcompat path and fix result_type()#374
Use new DType API with backcompat path and fix result_type()#374seberg wants to merge 3 commits into
result_type()#374Conversation
de3a5a8 to
eff09d7
Compare
|
@hawkinsp just in case you have a quick thought here. I tried to rewrite things to just use the new API but keep it a "legacy" dtype to some degree, so that there should be no real regressions but at the same time it works fine all the way back to NumPy 2 (the only regression I noticed it that arrays print a bit less nice). However, this uses The alternative I can currently think off is to just allow |
|
I haven't looked yet, but
It's not the end of the world to have to build ml_dtypes per Python version: it's a small enough package. I had previously abandoned trying to use the limited dtype API for similar reasons (#195). And eventually when the oldest supported NumPy ages off our support matrix, we can switch. |
|
OK, cool, then I think I'll pursue this, we need a better way to transition a package like Long term for the stable API: I suspect the right thing will be to have a new DType creation function, that creates the full heap-type for you based on the spec. |
result_type()
result_type()result_type()
This uses the new-style API for NumPy 2.0+. NumPy 2.5 ships some backport compatibility hacks that we vendor here to compile on older NumPy versions as well. This requires some churn, the biggest one being settuping a within DType casting implementation. However, it allows using new API optionally the biggest thing being that `result_type()` can now do the right thing. The one downside is that the `dtype=` is printing not so nice for NumPy <2.5.
c67581e to
800f815
Compare
|
I made a big pass on this cleaning things up and hopefully fixing some issues (byte-swapping and an incorrect This should now actually be OK, although it does require NumPy 2.0+. With this being merged in NumPy and working out fine here, this is now actually ready. I'll note again the one little annoyance I have found for now (hopefully the only one), and that is that when printing arrays we now print Unlike gh-360 this doesn't make it obvious to just not set a character for example but it does ensure that old-style code paths are taken so that actual regressions are unlikely (while it seems it'll be a long whack-a-mole with gh-360). |
|
This also now fixes gh-301 which I suspect is the magic thing that might make my CuPy "test everything" attempt feasible (to the point I am considering if we should just hack that flag if this PR isn't so easy). (Nevermin, older NumPy will still not support it nicely of course. If we want to improve this without the flag, I think the solution might be to check if |
gh-360 is a little bit hard to actually pull of. In NumPy 2.5+ (backport vendorered to compile on older NumPy versions), it is now allowed to create a "new style DType" that still is a legacy dtype.
That means we inherit a few existing quirks, but it mostly means that
ml_dtypescan start using new API conveniently without much concern for backwards compatibility issues.The current state is:
result_type()now does a better job/the right thing (yes this adds a bunch of code).array(..., dtype=dtype('bfloat16'))rather than justbfloat16. (I am restoring that in NumPy for 2.5+, although we need a better solution).Next steps, unlocked things:
np.finfo()(given one backport PR from me in NumPy) for NumPy 2.5+.I agree that the long list of dtypes is a bit annoying for
CommonDType. After NumPy has a better DType hierarchy, we could allow creating your own baseclass there easily which would simplify this (although maybe not speed up in practice).(heavy use of claude to spit out code, but of course with absolute design micro-managing in many relevant parts -- but I'll need to go through once more myself).