Why not using DW conv

Hi, thanks for the paper. 
While your paper does again show that any mixing in spatial domain could work in CV, from practical point of view there is a large issue with using AvgPool2d. On inference it's not faster that DepthwiseConv but using a fixed filter instead of learned one, which leads to much lower network capacity. Have you tried using DW 3x3 instead of AvgPool ?