Skip to content

Fold contiguous-axis stride in gather/scatter offset chains#212

Merged
maleadt merged 4 commits into
mainfrom
tb/tilearray_prop
Apr 30, 2026
Merged

Fold contiguous-axis stride in gather/scatter offset chains#212
maleadt merged 4 commits into
mainfrom
tb/tilearray_prop

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented Apr 30, 2026

  • Extend ConstantAnalysis to recognise getfield(getfield(arg, :strides), 1)
    for TileArrays whose ArraySpec encodes contiguous=true, plus scalar
    pass-through on trunci / exti / bitcast so Int32(stride) doesn't
    drop the fact. Combined with the existing muli(x, 1) → x algebra rule,
    this collapses the runtime muli(idx, stride1) in gather / scatter
    offset computations — letting tileiras's auto-vectorizer prove consecutive
    lanes touch consecutive addresses and emit wide STG.E.128 stores instead
    of scalar STG.E.U16. Mirrors cuTile Python's static_stride == 1: offset_delta = ind shortcut in _gather_scatter_pointer_and_mask.
  • Factor the two-level getfield(arg::TileArray, ...) chain walk that
    divisibility, bounds, and constant analyses previously each duplicated
    into one decode_tilearray_field helper (new
    analysis/tilearray.jl); each analysis now projects a typed
    TileArrayFieldRef to its own lattice via a pure function. No
    behavior change in this commit.

maleadt and others added 4 commits April 30, 2026 11:01
…ays.

For a TileArray whose ArraySpec encodes `contiguous=true`, Julia's
column-major convention means `stride[1] == 1` is statically known.
Extend the constant analysis to recognize this `getfield(getfield(arg,
:strides), 1)` chain and propagate the literal `1`.

Also propagate scalar constants through type-narrowing intrinsics
(`trunci`, `exti`, `bitcast`) so a `1::Int64` field stays constant
after `Int32(stride)` lowers to `trunci`.

Sets up downstream folds on `muli(idx, stride1)` in gather/scatter
offset computations. The matching `muli(x, 1) → x` algebra rule is
left disabled for now: tileiras crashes at -O1+ when the simplified
IR reaches its auto-vectorizer (separate bug to file upstream).

Adjust the slice and no_wrap codegen tests to use non-contiguous specs
where the `muli(start, stride)` survival is what's being checked, so
the new fold doesn't collapse them into constants before the test pattern matches.
The divisibility, bounds, and constant analyses each duplicated the same
two-level `getfield` chain walker for `getfield(arg::TileArray, :ptr)`
and `getfield(getfield(arg, :sizes|:strides), i)`. Pull the walk into a
single `decode_tilearray_field` helper returning a `TileArrayFieldRef`;
each analysis projects to its own lattice via a small pure function
(`tilearray_field_divby` / `_bounds` / `_constant`).

No behavior change — all codegen and analysis tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
It's a generic SCI traversal helper used by every analysis (divisibility,
bounds, constant, the new tilearray decoder). Living in divisibility.jl
was an accident of where it was first needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…om field type.

`ArraySpec`'s inner constructor now rejects synthetic specs that combine
`contiguous=true` with `stride_div_by[1] > 1` — physically impossible
(stride[1]=1, and 1 is divisible only by 1). With this enforced at the
type's construction boundary, the constant-fold projection drops the
defensive consistency check and trusts the type.

`tilearray_field_constant` now derives the result's integer type from
`eltype(fieldtype(T, :strides))` rather than hardcoding `Int32(1)`, so
any future change to `TileArray.strides`'s element type carries through
automatically.

Several existing tests used inconsistent specs (`contiguous=true` with
`stride_div_by[1]>1`) to drive divisibility dataflow with a non-unit
stride hint. Those are migrated to either drop the stride hint (when
the test only asserted on shape facts) or flip to `contiguous=false`
(when the stride hint was load-bearing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@maleadt maleadt merged commit 34f3c47 into main Apr 30, 2026
13 checks passed
@maleadt maleadt deleted the tb/tilearray_prop branch April 30, 2026 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant