Causality of bert encoder in PAST streamable model

Hi @Or-Tal @nadav366,
Thank you for providing no transformer version of PAST model.

While trying to train PAST streamable model, I have noticed that causal mask is not used on bert transformer encoder. Does that experiment was tried already, if so is it reducing the performance?


Thanks