Hi, I found your paper very interesting and I have a quick question. Here, the paper proposed CPE for the vision task, which is 2D and therefore the locality assumption would be valid. I would like to ask whether u have designed 1-D CPE and do similar experiments in the NLP tasks? If so, how could we choose the kernel size?