In order to get the best performance, multiple seeds / training runs should be kicked off in parallel with the best performing agents "surviving" and being fine-tuned in a sort of evolutionary fashion. The ExperimentGrid is a nice way to do this on one machine and manually picked hyperparams. However, more automated techniques like population based training could most likely achieve even better performance.
In order to get the best performance, multiple seeds / training runs should be kicked off in parallel with the best performing agents "surviving" and being fine-tuned in a sort of evolutionary fashion. The ExperimentGrid is a nice way to do this on one machine and manually picked hyperparams. However, more automated techniques like population based training could most likely achieve even better performance.