The Next Generation of AI Chips
Deep learning has fuelled significant progress in computer vision, speech recognition, and natural language processing. We have seen a single deep learning algorithm learn to recognize two vastly different languages, English and Mandarin, begin to synthesize realistic human speech, recognize visual data, and even to understand human language. At Baidu, we think that this is just the beginning, and high performance computing is poised to help. It turns out that deep learning is compute limited, even on the fastest machines that we can build. This talk will focus on a new discovery that significantly accelerates deep learning training by using mixed 16-bit and 32-bit IEEE standard floating point arithmetic. Unlike previous work in this area, the first generation of commodity hardware realizing an up to 8x speedup from this approach is already shipping in volume. We demonstrate the success of this approach across 15 state of the art deep learning training applications drawn from a diverse set of problem domains, and detail the small changes to deep learning frameworks needed to support this technology.
Greg Diamos leads computer systems research at Baidu’s Silicon Valley AI Lab (SVAIL), where he helped develop the Deep Speech and Deep Voice systems. Before Baidu, Greg contributed to the design of compiler and microarchitecture technologies used in the Volta GPU at NVIDIA. Greg holds a PhD from the Georgia Institute of Technology, where he led the development of the GPU-Ocelot dynamic compiler, which targeted CPUs and GPUs from the same program representation.