🚀 The feature, motivation and pitch
Properly handle the errors occurred during the training:
- Stop issuing new RPC request when an previous one failed
- Terminate training once an error happened
- Properly clean up the shared graph data
Consider both the mp mode and the collocated mode.
Alternatives
No response
Additional context
No response
🚀 The feature, motivation and pitch
Properly handle the errors occurred during the training:
Consider both the mp mode and the collocated mode.
Alternatives
No response
Additional context
No response