-
Notifications
You must be signed in to change notification settings - Fork 266
[CK_BUILDER] Testing-specific descriptor initialization #3563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
[CK_BUILDER] Testing-specific descriptor initialization #3563
Conversation
Remove old CK host descriptor helper dependency from CK Builder testing framework and implement testing-owned descriptor computation. Core changes (ALMIOPEN-522): - Remove ck/library/utility/convolution_* includes from conv_fwd.hpp - Add ConvFwdProblem struct (testing-owned conv parameter description) - Add Args::make_conv_problem() (computes output spatial, no old CK dependency) - Rewrite make_input/weight/output_descriptor() with testing-specific stride mapping for all supported layouts (GNHWC, NHWGC, etc.) - Add optional explicit tensor-memory stride API (std::optional fields) - Update conv_fwd_ck.hpp and conv_fwd_reference.hpp to use make_conv_problem() - Remove to_ck_conv_param() method Additional improvements: - Add CK Tile forward EndToEnd support (conv_fwd_ck_tile.hpp + test) - Proves shared Args/descriptor design works across backends - Merge Create + EndToEnd tests into single file (matches old CK pattern) - Fix unit_validation.cpp BF16 initialization for reliable testing Results: - Old CK warnings removed (no more RowMajor/ColumnMajor spam) - All smoke tests pass (5/5) - Old CK EndToEnd passes (XDL vs GPU reference) - CK Tile EndToEnd passes (Tile vs GPU reference) Note: This PR conflicts with #3518 (tile-builder-testing). Both touch conv_fwd.hpp but with different approaches. This implementation directly addresses ALMIOPEN-522 requirements by removing old CK dependency.
df4523d to
ae55803
Compare
| template <int SPATIAL_DIM> | ||
| struct ConvFwdProblem | ||
| { | ||
| int G = 1; | ||
| int N = 1; | ||
| int C = 1; | ||
| int K = 1; | ||
|
|
||
| std::array<long_index_t, SPATIAL_DIM> input_spatial = {}; | ||
| std::array<long_index_t, SPATIAL_DIM> filter_spatial = {}; | ||
| std::array<long_index_t, SPATIAL_DIM> output_spatial = {}; | ||
|
|
||
| std::array<long_index_t, SPATIAL_DIM> conv_strides = {}; | ||
| std::array<long_index_t, SPATIAL_DIM> conv_dilations = {}; | ||
| std::array<long_index_t, SPATIAL_DIM> left_pads = {}; | ||
| std::array<long_index_t, SPATIAL_DIM> right_pads = {}; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not totally convinced that this structure is required since it basically encodes whats in the Args but in a slightly different way. In the ckt::run implementations, couldn't we just get these values straight from the Args struct? If the FilterExtent type is annoying, we could consider adding something like to_vector<index_type>() and/or to_array<index_type>() as members. For output_spatial, you could turn that into a member function of Args. That would then return a FilterExtent<SPATIAL_DIM> so that the type is the same a Args::lengths::image and Args::lengths::filter. The implementation of ckt::run would then have something like this:
auto ck_args = conv.MakeArgument(
...,
args.filter_strides.to_array<ck::index_t>(),
args.filter_dilations.to_array<ck::index_t>(),
args.input_left_pad.to_array<ck::index_t>(),
args.input_right_pad.to_array<ck::index_t>(),
...
)What do you think?
| template <int SPATIAL_DIM> | ||
| std::array<long_index_t, SPATIAL_DIM> to_spatial_array(const FilterExtent<SPATIAL_DIM>& extent) | ||
| { | ||
| if constexpr(SPATIAL_DIM == 1) | ||
| { | ||
| return {static_cast<long_index_t>(extent.width)}; | ||
| } | ||
| else if constexpr(SPATIAL_DIM == 2) | ||
| { | ||
| // CK Builder uses spatial ordering {H, W} for 2D. | ||
| return {static_cast<long_index_t>(extent.height), static_cast<long_index_t>(extent.width)}; | ||
| } | ||
| else | ||
| { | ||
| // CK Builder uses spatial ordering {D, H, W} for 3D. | ||
| return {static_cast<long_index_t>(extent.depth), | ||
| static_cast<long_index_t>(extent.height), | ||
| static_cast<long_index_t>(extent.width)}; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could replace this then by the hypothetical FilterExtent::to_array<index_type>()
| if constexpr(SPATIAL_DIM == 1) | ||
| { | ||
| if constexpr(layout == TensorLayout::GNCW) | ||
| return detail::make_packed_strides_for_order<4>(lens, {0, 1, 2, 3}); | ||
| else if constexpr(layout == TensorLayout::GNWC || | ||
| layout == TensorLayout::G_NW_C_strided) | ||
| return detail::make_packed_strides_for_order<4>(lens, {0, 1, 3, 2}); | ||
| else if constexpr(layout == TensorLayout::NWGC) | ||
| return detail::make_packed_strides_for_order<4>(lens, {1, 3, 0, 2}); | ||
| else if constexpr(layout == TensorLayout::NGCW) | ||
| return detail::make_packed_strides_for_order<4>(lens, {1, 0, 2, 3}); | ||
| else | ||
| static_assert(sizeof(UnsupportedEnumValue<layout>) == 0, | ||
| "Unsupported 1D input layout for descriptor initialization."); | ||
| } | ||
| else if constexpr(SPATIAL_DIM == 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like the amount of chained conditions going on in this function. The TensorLayout already kind of "encodes" the spatial dimension, so the first step I would do is turning this into a giant switch instead:
switch (layout) {
// 1D layouts
case TensorLayout::GNCW:
...
case TensorLayout::GNWC:
...
// 2D layouts
case TensorLayout::GNCHW:
...
/// 3D layouts
case TensorLayout::GNCDHW:
...
}Note that this switch can in theory contain the logic for both input, weight, and output tensors, since all the layouts are in the same enum.
I also don't really like the way that the condition is tied to make_packed_strides_for_order. There is already functionality for constructing packed TensorDescriptors (see PackedRightLayout), and putting all this code here means a hard-to-test function. On top of that, I think a more generic way to query the layout order from a layout would be nice for downstream users, and this would also be much easier to test. I would try to extract this code into a separate function, maybe something like this:
// Get the order from dims=[G, N/K, K/C, D/Z, H/Y, W/X] to LAYOUT.
// Returns an array which has on index i a mapping to the index in
// the `dims` array.
template <tensorLayout LAYOUT>
constexpr auto layout_order() {
// avoid manually writing the size of the order every time
const auto order = [](auto... args) {
return std::array<size_t, sizeof...(args)>{args...};
};
switch (LAYOUT)
{
case GNWC:
return order(0, 1, 2, 3);
case GNWC:
return order(0, 1, 3, 2);
... etc, for 1D, 2D, 3D, and input/weight/output in the same function
}
}Then your implementation of make_input/weight/output_descriptor looks like this:
InputDescriptor make_input_descriptor() const {
Extent lens = // compute as before
if (input_strides.has_value()) {
return InputDescriptor(order, input_strides.value());
}
const auto order = layout_order<SIGNATURE.input.config.layout>();
const auto ordered_lens = reorder(lens, order); // You can put this in tensor_descriptor.hpp
return InputDescriptor(ordered_lens, PackedRightLayout{});
}You cna now separately test layout_order and reorder, and there is no need for ALSO testing whether make_packed_strides_for_order actually packs the order or not, since constructing tensors using PackedRightLayout{} is already tested separately.
[CK_BUILDER] Testing-specific descriptor initialization
Problem
The CK Builder testing framework (
conv_fwd.hpp) currently depends on old CK host descriptor helpers to generate tensor descriptors. This causes several issues:Noisy warnings during tests:
Unwanted coupling: The backend-agnostic testing layer depends on old CK utility code:
Packed-only limitation: Old CK helpers only support packed (contiguous) tensors; custom strides are not supported
Extra indirection: Must construct
ck::utils::conv::ConvParamjust to compute descriptorsImplementation
Implement testing-specific descriptor initialization in
conv_fwd.hpp:Core changes:
Remove old CK dependency
ck/library/utility/convolution_*ck::utils::conv::make_*_host_tensor_descriptor_*_packed()Args::to_ck_conv_param()Add testing-owned conv parameter struct (
ConvFwdProblem)Add
Args::make_conv_problem()Rewrite descriptor methods with testing-specific stride mapping
make_input_descriptor()/make_weight_descriptor()/make_output_descriptor()*_stridedlayout variantsAdd explicit tensor-memory stride API
input_strides,weight_strides,output_stridesUpdate launchers to use
make_conv_problem()conv_fwd_ck.hpp(old CK runner)conv_fwd_reference.hpp(GPU reference runner)Additional improvements:
Add CK Tile forward EndToEnd support
conv_fwd_ck_tile.hpp(CK Tile runner)test_ckb_conv_fwd_2d_fp16_v3.cpp(Create + EndToEnd)Args/descriptor design works across backendsFix smoke test (
unit_validation.cpp)Note on PR #3518
This PR conflicts with #3518 (
barkocot/tile-builder-testing). Both modifyconv_fwd.hppbut with different approaches:This PR needs to wait for it to be merged and then fix the conflict again.
Checklist
Discussion
Fix smoke test (
unit_validation.cpp)Bug: Converting integer → BF16 was treating the integer as raw bits instead of a number, making the test unreliable.
EX)
ck::type_convert<ck::bhalf_t>(100) // integer → BF16Fix: Convert integer → float → BF16 so we get numeric values (5.0, 100.0, etc.) instead of raw bit patterns.
ck::type_convert<ck::bhalf_t>(static_cast<float>(100)) // integer → float → BF16Test code and Result
Result