Skip to content

Execute convolution in NCHW if the suggested mem format is NHWC but the actual mem layout is NCHW#303

Open
DenisVieriu97 wants to merge 30 commits into
masterfrom
dev/denis/conv_nchw_optimization
Open

Execute convolution in NCHW if the suggested mem format is NHWC but the actual mem layout is NCHW#303
DenisVieriu97 wants to merge 30 commits into
masterfrom
dev/denis/conv_nchw_optimization

Conversation

@DenisVieriu97

@DenisVieriu97 DenisVieriu97 commented Feb 9, 2023

Copy link
Copy Markdown
Collaborator

Since NHWC is represented as a view operation in PyTorch, we can execute the convolution ops directly in NCHW if the suggested memory format is NHWC but the actual memory layout is still NCHW.
This skips over unnecessary steps, such as gather to materialize the memory from NCHW -> NHWC, then transposing back the output of the convolution to NCHW in the graph.

@kulinseth

Copy link
Copy Markdown
Owner

@DenisVieriu97 , needs extensive testing... I think we should merge this post PyT 2.0..

@DenisVieriu97 DenisVieriu97 force-pushed the dev/denis/conv_nchw_optimization branch from 086e3a5 to 4e74a86 Compare February 10, 2023 06:44
DenisVieriu97 and others added 4 commits February 14, 2023 07:15
* Test MPS CI runners

* Cherry pick remaining files

* Enable lintrunner:

* Change lint  runner

* Retrigger checks

* Retrigger checks #2

* Retrigger checks #3

* Retrigger checks #4

* Retrigger checks #5

* Retrigger checks #5

* Retrigger checks #7

* Retrigger checks #8

* Retrigger checks #9

* Retrigger checks #9 (change arch to arm)

* Retrigger checks #10

* Retrigger checks #11

* Retrigger checks #12

* Retrigger checks #13

* Retrigger checks #14

* Retrigger checks #14

* Retrigger checks #15

* Retrigger checks #16

* Retrigger checks #16

* Retrigger checks #17

* Retrigger checks #19

* Retrigger checks #20

* Retrigger checks #21

* Fix lintrunner

* Fix lintrunner

* Remove lint.json
* Use DISTRIBUTED=1 for MPS CI runners

* Disable openmp
razarmehr and others added 12 commits February 14, 2023 15:26
* Remove unnecessary CI files

* Additional files

* Update lint
* Enable test modules on MPS and CI runners

* Update lint.yml

* Update comments

* Retrigger CI

* Retrigger CI #2

* Remove comment
… 12. (#313) (#328)

* Block uint8 data type for unary and binary ops on macOS 12. (#313)

* fixes after cherry-pick

---------

Co-authored-by: Ronian526 <11454459+Ronian526@users.noreply.github.com>
* Fix bilinear backward pass

* Remove comment
* Update macOS 12 blocklist
- move sum, masked.var, mul to low precision list
- unblock them from running

* - mark __rdiv__ failures as accumulate error exceeds atol/rtol
- Backward pass has to give explicit bias tensor of zeros if none is passed to the op or the bias gradient will not be calculated.
- Fixed bias tensor mistakenly getting overwritten to zeros
- Fixes crash when lstm op called with has_biases set to false. Change takes into account the changed shape of the input params TensorList depending on the bias flag.

Co-authored-by: Kulin Seth <kulin_seth@apple.com>
- add _mps_convolution_impl that takes optional shape
- for conv_tranpose2d grad, use the shape from input directly
- remove nn.functional.conv_transpose2d grad from blocklist

Co-authored-by: Ronian526 <11454459+Ronian526@users.noreply.github.com>
Fixes a crash where the inputTensor could go null and cause a crash.
- casting the input tensor to float32 and cast back the output tensor
- unblock the test
@DenisVieriu97 DenisVieriu97 force-pushed the dev/denis/conv_nchw_optimization branch from 4e74a86 to b0c9dbb Compare February 15, 2023 21:17
DenisVieriu97 and others added 5 commits February 15, 2023 21:27
- give warnings of converting int64 for reduction ops
- use cast tensor for reduction sum on trace
- unblock trace from running
* - move nn.functional.feature_alpha_dropoutwith_train, normalnumber_mean, new_empty_strided to expected failures

* - update new_empty_strided

---------

Co-authored-by: Kulin Seth <kulin_seth@apple.com>
…ntiguous calls (#341)

* Fix convolution crash; remove unnecessary contiguous calls

* Fix lintrunner
This should fix the failure with GPT2 when use_cache=True
@DenisVieriu97 DenisVieriu97 force-pushed the dev/denis/conv_nchw_optimization branch from b0c9dbb to 6caefbc Compare February 18, 2023 02:34
DenisVieriu97 and others added 5 commits February 18, 2023 10:27
Co-authored-by: Ramin Azarmehr <razarmehr@apple.com>
* Handle broadcasting by expanding src tensor in Copy.mm

* Unblock linalg_matrix_power

* Improved formatting
@DenisVieriu97 DenisVieriu97 force-pushed the dev/denis/conv_nchw_optimization branch from 4d2ba19 to 9a5b002 Compare February 21, 2023 23:28
@kulinseth kulinseth force-pushed the master branch 2 times, most recently from 298b2d6 to 606362d Compare February 28, 2023 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants