Execute convolution in NCHW if the suggested mem format is NHWC but the actual mem layout is NCHW#303
Open
DenisVieriu97 wants to merge 30 commits into
Open
Execute convolution in NCHW if the suggested mem format is NHWC but the actual mem layout is NCHW#303DenisVieriu97 wants to merge 30 commits into
DenisVieriu97 wants to merge 30 commits into
Conversation
Owner
|
@DenisVieriu97 , needs extensive testing... I think we should merge this post PyT 2.0.. |
086e3a5 to
4e74a86
Compare
Fix the TEST_WITH_MPS macro.
* Test MPS CI runners * Cherry pick remaining files * Enable lintrunner: * Change lint runner * Retrigger checks * Retrigger checks #2 * Retrigger checks #3 * Retrigger checks #4 * Retrigger checks #5 * Retrigger checks #5 * Retrigger checks #7 * Retrigger checks #8 * Retrigger checks #9 * Retrigger checks #9 (change arch to arm) * Retrigger checks #10 * Retrigger checks #11 * Retrigger checks #12 * Retrigger checks #13 * Retrigger checks #14 * Retrigger checks #14 * Retrigger checks #15 * Retrigger checks #16 * Retrigger checks #16 * Retrigger checks #17 * Retrigger checks #19 * Retrigger checks #20 * Retrigger checks #21 * Fix lintrunner * Fix lintrunner * Remove lint.json
* Use DISTRIBUTED=1 for MPS CI runners * Disable openmp
* Remove unnecessary CI files * Additional files * Update lint
* Fix bilinear backward pass * Remove comment
* Update macOS 12 blocklist - move sum, masked.var, mul to low precision list - unblock them from running * - mark __rdiv__ failures as accumulate error exceeds atol/rtol
- Backward pass has to give explicit bias tensor of zeros if none is passed to the op or the bias gradient will not be calculated. - Fixed bias tensor mistakenly getting overwritten to zeros - Fixes crash when lstm op called with has_biases set to false. Change takes into account the changed shape of the input params TensorList depending on the bias flag. Co-authored-by: Kulin Seth <kulin_seth@apple.com>
- add _mps_convolution_impl that takes optional shape - for conv_tranpose2d grad, use the shape from input directly - remove nn.functional.conv_transpose2d grad from blocklist Co-authored-by: Ronian526 <11454459+Ronian526@users.noreply.github.com>
Fixes a crash where the inputTensor could go null and cause a crash.
- casting the input tensor to float32 and cast back the output tensor - unblock the test
4e74a86 to
b0c9dbb
Compare
- unblock rdiv float16
* Fix upsample for NHWC output * Add testcase
- give warnings of converting int64 for reduction ops - use cast tensor for reduction sum on trace - unblock trace from running
* - move nn.functional.feature_alpha_dropoutwith_train, normalnumber_mean, new_empty_strided to expected failures * - update new_empty_strided --------- Co-authored-by: Kulin Seth <kulin_seth@apple.com>
…ntiguous calls (#341) * Fix convolution crash; remove unnecessary contiguous calls * Fix lintrunner
This should fix the failure with GPT2 when use_cache=True
b0c9dbb to
6caefbc
Compare
Co-authored-by: Ramin Azarmehr <razarmehr@apple.com>
* Handle broadcasting by expanding src tensor in Copy.mm * Unblock linalg_matrix_power * Improved formatting
…he actual mem layout is NCHW
4d2ba19 to
9a5b002
Compare
298b2d6 to
606362d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Since NHWC is represented as a view operation in PyTorch, we can execute the convolution ops directly in NCHW if the suggested memory format is NHWC but the actual memory layout is still NCHW.
This skips over unnecessary steps, such as gather to materialize the memory from NCHW -> NHWC, then transposing back the output of the convolution to NCHW in the graph.