Thank you for sharing this code!
I am testing your code for multitask video with BART on 24GB GPUs.
To run your code on 24GB GPUs, I used below command to enable DDP. (batch size:50 -> 25)
bash scripts/video/single_adapter.sh 2
However, it showed worse results than the performance on a single 48GB GPU.
When I increased the number of GPUs, the performance was getting worse.
Because the model doesn't have BatchNorm, I thought the performance should be similar.
Have you tried DDP? Or do you have any intuition about the problem?
Thank you for sharing this code!
I am testing your code for multitask video with BART on 24GB GPUs.
To run your code on 24GB GPUs, I used below command to enable DDP. (batch size:50 -> 25)
bash scripts/video/single_adapter.sh 2
However, it showed worse results than the performance on a single 48GB GPU.
When I increased the number of GPUs, the performance was getting worse.
Because the model doesn't have BatchNorm, I thought the performance should be similar.
Have you tried DDP? Or do you have any intuition about the problem?