Hi, thanks for the paper.
While your paper does again show that any mixing in spatial domain could work in CV, from practical point of view there is a large issue with using AvgPool2d. On inference it's not faster that DepthwiseConv but using a fixed filter instead of learned one, which leads to much lower network capacity. Have you tried using DW 3x3 instead of AvgPool ?
Hi, thanks for the paper.
While your paper does again show that any mixing in spatial domain could work in CV, from practical point of view there is a large issue with using AvgPool2d. On inference it's not faster that DepthwiseConv but using a fixed filter instead of learned one, which leads to much lower network capacity. Have you tried using DW 3x3 instead of AvgPool ?