Fold contiguous-axis stride in gather/scatter offset chains#212
Merged
Conversation
…ays. For a TileArray whose ArraySpec encodes `contiguous=true`, Julia's column-major convention means `stride[1] == 1` is statically known. Extend the constant analysis to recognize this `getfield(getfield(arg, :strides), 1)` chain and propagate the literal `1`. Also propagate scalar constants through type-narrowing intrinsics (`trunci`, `exti`, `bitcast`) so a `1::Int64` field stays constant after `Int32(stride)` lowers to `trunci`. Sets up downstream folds on `muli(idx, stride1)` in gather/scatter offset computations. The matching `muli(x, 1) → x` algebra rule is left disabled for now: tileiras crashes at -O1+ when the simplified IR reaches its auto-vectorizer (separate bug to file upstream). Adjust the slice and no_wrap codegen tests to use non-contiguous specs where the `muli(start, stride)` survival is what's being checked, so the new fold doesn't collapse them into constants before the test pattern matches.
The divisibility, bounds, and constant analyses each duplicated the same two-level `getfield` chain walker for `getfield(arg::TileArray, :ptr)` and `getfield(getfield(arg, :sizes|:strides), i)`. Pull the walk into a single `decode_tilearray_field` helper returning a `TileArrayFieldRef`; each analysis projects to its own lattice via a small pure function (`tilearray_field_divby` / `_bounds` / `_constant`). No behavior change — all codegen and analysis tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
It's a generic SCI traversal helper used by every analysis (divisibility, bounds, constant, the new tilearray decoder). Living in divisibility.jl was an accident of where it was first needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…om field type. `ArraySpec`'s inner constructor now rejects synthetic specs that combine `contiguous=true` with `stride_div_by[1] > 1` — physically impossible (stride[1]=1, and 1 is divisible only by 1). With this enforced at the type's construction boundary, the constant-fold projection drops the defensive consistency check and trusts the type. `tilearray_field_constant` now derives the result's integer type from `eltype(fieldtype(T, :strides))` rather than hardcoding `Int32(1)`, so any future change to `TileArray.strides`'s element type carries through automatically. Several existing tests used inconsistent specs (`contiguous=true` with `stride_div_by[1]>1`) to drive divisibility dataflow with a non-unit stride hint. Those are migrated to either drop the stride hint (when the test only asserted on shape facts) or flip to `contiguous=false` (when the stride hint was load-bearing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ConstantAnalysisto recognisegetfield(getfield(arg, :strides), 1)for
TileArrays whoseArraySpecencodescontiguous=true, plus scalarpass-through on
trunci/exti/bitcastsoInt32(stride)doesn'tdrop the fact. Combined with the existing
muli(x, 1) → xalgebra rule,this collapses the runtime
muli(idx, stride1)ingather/scatteroffset computations — letting tileiras's auto-vectorizer prove consecutive
lanes touch consecutive addresses and emit wide
STG.E.128stores insteadof scalar
STG.E.U16. Mirrors cuTile Python'sstatic_stride == 1: offset_delta = indshortcut in_gather_scatter_pointer_and_mask.getfield(arg::TileArray, ...)chain walk thatdivisibility, bounds, and constant analyses previously each duplicated
into one
decode_tilearray_fieldhelper (newanalysis/tilearray.jl); each analysis now projects a typedTileArrayFieldRefto its own lattice via a pure function. Nobehavior change in this commit.