Considerations for Multi GPU Deep Learning Model Training
As the amount of available data grows, deep learning products and applications profit from the use of bigger and bigger models. To effectively train these models in a timely manner it is necessary to parallelize training over multiple GPUs and multiple machines. In this talk, we explore what is needed to efficiently achieve this kind of scaling, both from a hardware and library perspective, but also from the perspective of the end user. Methods of multi GPU parallelization are described and discussed, and best practices are presented. Furthermore, the choice of deep learning framework, and its impact on multi GPU training is discussed, along with the resulting tradeoff between flexibility and engineering effort. Finally, examples and benchmarking results are described and discussed, showing the possibility of near-linear scaling of training time both to multiple GPUs and multiple machines.
Jonas Lööf is a Deep Learning Solution Architect at NVIDIA, where he helps guide customer decision making on both hardware and software in their deep learning projects. Before joining NVIDIA, Jonas has worked in research and development, applying deep learning in the fields of speech recognition and natural language processing, both in a startup environment and the corporate world. Jonas holds a doctoral degree in computer science from RWTH Aachen University, where he worked on acoustic model adaptation for speech recognition.