Learning Lip Sync from Audio
Modeling and understanding human beings is pivotal to numerous applications ranging from 3D modeling for telepresence in virtual reality, films, summarizing and visualizing big photo collections, autonomous driving, and recognizing and searching for missing people to name a few. While typically done in a laboratory setting and lots of manual interaction, we’ve been pioneering human modeling "in the wild", by leveraging casual photos and videos that were already captured or easy to capture with commodity cameras. I will take you on a journey of attempting to achieve that, showing our latest technical progress as well as applications we've developed in the process, mostly focusing on our recent work of synthesizing realistic video of a person talking from audio.
Ira Kemelmacher-Shlizerman is an Assistant Professor at the Allen School of Computer Science and Research Scientist at Facebook. She received her Ph.D in computer science and applied mathematics at the Weizmann Institute of Science. Ira works in computer vision, graphics, and learning particularly focusing on modeling people, and virtual and augmented reality. She received the Google faculty award, her work “Moving Portraits” was selected to the cover of the Communications of the ACM, Research Highlights, and tech transferred to Google. Her work “Illumination aware age progression” and its application to missing children search was featured by interviews on national TV, e.g., CBS, NBC, and many others. Ira's 3D face reconstruction from Internet photos received the Madrona prize, and the "Innovation of the 2016 Year Award" by Geekwire. She founded a startup Dreambit that was acquired by Facebook.