- Currently, the code is setup to decompose the domain using y-z slabs which can severely limit the scalability when `N_cpu > Nx` - Consider porting the MPI version of the code to 2D pencil decomposition for scalability
N_cpu > Nx