References

Jump to bottom

YBC edited this page Oct 30, 2020 · 20 revisions

Algorithm related papers

AD-PSGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent (arxiv)
SPG: Stochastic Gradient Push for Distributed Deep Learning (arxiv)
MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling (arxiv)
EASGD: Deep learning with Elastic Averaging SGD (arxiv)
Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training (http://alchem.usc.edu/portal/static/download/prague.pdf)
Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs(arxiv)
Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond (arxiv)
Communication Efficient Distributed Machine Learning with the Parameter Server ([pdf] http://www.cs.cmu.edu/~muli/file/parameter_server_nips14.pdf)
Consensus and Cooperation in Networked Multi-Agent Systems(pdf)
A Unified Theory of Decentralized SGD with Changing Topology and Local Updates (arxiv)

System-related papers

Efficient Processing of Deep Neural Networks (link)
Demystifying Parallel and Distributed Deep Learning (Github)
Parallel Algorithm (PDF)
Technologies behind Distributed Deep Learning: AllReduce (https://tech.preferred.jp/en/blog/technologies-behind-distributed-deep-learning-allreduce/)
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect (https://arxiv.org/pdf/1903.04611.pdf)

Implementation/code related

Horovod (github)
BytePS (github)
Rabit (github)
Stochastic Gradient Push (github)
Pytorch DDP (paper, code)

GPU-Aware MPI

OpenMPI FAQ (Link)
MVAPICH (link)

Github Awesome Series

Awesome distributed systems (github)
Awesome distributed deep learning (github)