Skip to content

References

YBC edited this page Oct 30, 2020 · 20 revisions

Algorithm related papers

  1. AD-PSGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent (arxiv)
  2. SPG: Stochastic Gradient Push for Distributed Deep Learning (arxiv)
  3. MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling (arxiv)
  4. EASGD: Deep learning with Elastic Averaging SGD (arxiv)
  5. Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training (http://alchem.usc.edu/portal/static/download/prague.pdf)
  6. Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs(arxiv)
  7. Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond (arxiv)
  8. Communication Efficient Distributed Machine Learning with the Parameter Server ([pdf] http://www.cs.cmu.edu/~muli/file/parameter_server_nips14.pdf)
  9. Consensus and Cooperation in Networked Multi-Agent Systems(pdf)
  10. A Unified Theory of Decentralized SGD with Changing Topology and Local Updates (arxiv)

System-related papers

  1. Efficient Processing of Deep Neural Networks (link)
  2. Demystifying Parallel and Distributed Deep Learning (Github)
  3. Parallel Algorithm (PDF)
  4. Technologies behind Distributed Deep Learning: AllReduce (https://tech.preferred.jp/en/blog/technologies-behind-distributed-deep-learning-allreduce/)
  5. Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect (https://arxiv.org/pdf/1903.04611.pdf)

Implementation/code related

  1. Horovod (github)
  2. BytePS (github)
  3. Rabit (github)
  4. Stochastic Gradient Push (github)
  5. Pytorch DDP (paper, code)

GPU-Aware MPI

  1. OpenMPI FAQ (Link)
  2. MVAPICH (link)

Github Awesome Series

  1. Awesome distributed systems (github)
  2. Awesome distributed deep learning (github)

Clone this wiki locally