Amog Kamsetty

Simplifying Distributed Deep Learning with Ray Train

Deep learning (DL) has an enormous impact in a variety of domains, but with model and data size growing rapidly, scaling out deep learning training is essential for practical use. In this talk, we will first cover common patterns used to scale out DL training and challenges with distributed DL in practice: - Developer iteration speed - Modeling behavior at scale - Managing large-scale training data - Cloud compute costs We will introduce Ray Train, an open source library built on the Ray distributed execution framework, and show how it integrates with other open source libraries to alleviate the pain points above. We will conclude with a live demo showing large-scale training using these open source tools.

Amog Kamsetty is a software engineer at Anyscale where he works on building distributed training libraries and integrations on top of Ray. He is one of the lead developers of the Ray Train library. Amog previously completed his MS degree at UC Berkeley working with Ion Stoica on machine learning for database systems.

Buttontwitter Buttonlinkedin
This website uses cookies to ensure you get the best experience. Learn more