Building the Ideal Infrastructure for AI Workloads
What’s the “right” way to build your infrastructure stack for deep learning? Today, GPU infrastructure for deep learning is mostly built on bare metal, and resources are allocated to researchers in a static way. Hear from Omri Geller, CEO and co-founder of Run:AI about the three steps you can take to optimize, share resources, and ultimately free data scientists from AI infrastructure management hassles. Learn about:
- Tips for scheduling workloads with Kubernetes,
- Orchestration – bringing concepts from the world of HPC to AI for better management of expensive resources
- Scale - from fractional GPUs to multiple nodes of GPUs, for distributed training using batch workloads
Omri Geller co-founded Run:AI in 2018 in order to build a virtualization layer for AI workloads – essentially abstracting jobs from compute power in order to pool and dynamically share expensive compute resources. Omri leads all of Run:AI’s day to day activities as well as strategic direction. Prior to Run:AI, Omri served in an elite technological unit of the Israeli Prime Minister’s Office as part of the “Academic Atuda” program of the Israeli Defense Forces. By training, Omri is an algorithm engineer focused specifically on High Performance Computing algorithms. In 2015, he received an award for outstanding contribution to Israel’s defense. He is also the recipient of the Tel Aviv University award for excellence in M.Sc. studies (2015) and the Israel Defense Forces award for excellence in academic studies (2010).