
REGISTRATION & LIGHT BREAKFAST
Lisha Li - Amplify Partners
Lisha is a principal at Amplify Partners. She focuses on companies that leverage machine learning and data to solve problems and she is excited to be investing at a time when algorithmic and data-driven methods have such incredible potential for impact. Lisha completed her PhD at UC Berkeley focusing on deep learning and probability applied to the problem of clustering in graphs. Supported by the NSERC CGS fellowship, she worked with Prof. David Aldous and Prof. Joan Bruna. While at Berkeley she also did statistical consulting, advising on methods and analysis for experimentation and interpretation, and interned as a data scientist at Pinterest and Stitch Fix. She was the lecturer of discrete mathematics, as well as the graduate instructor for probability and statistics and intro CS theory.


VISUAL REASONING


Aaron Courville - Assistant Professor - University of Montreal
Visual Reasoning via Feature-wise Linear Modulation
Aaron Courville - University of Montreal
Visual Reasoning via Feature-wise Linear Modulation
Visual Reasoning - answering image-related questions which require a multi-step process to answer - is a task that explores how well models can learn about complex organizational structure of objects in the world. In this talk, I introduce a widely applicable form of Conditional Normalization we call FiLM: Feature-wise Linear Modulation. FiLM is a straightforward way of locally modifying the elements of a computational pipeline. In our application to Visual Reasoning, FiLM adapts the layers of a convolutional neural network to the specific question at hand. I will also show how FiLM-based models can generalize to challenging, new data from few examples or even, to an extent, to the zero-shot setting.
Aaron is an Assistant Professor in the Department of Computer Science and Operations Research (DIRO) at the University of Montreal, and member of the LISA lab. His current recent research interests focus on the development of deep learning models and methods. He is particularly interested in developing probabilistic models and novel inference methods. While he has mainly focused on applications to computer vision, he is also interested in other domains such as natural language processing, audio signal processing, speech understanding and just about any other artificial-intelligence-related task.


COMPUTER VISION


Raquel Urtasun - Head - Uber ATG/University of Toronto
Deep Learning for Self-Learning Cars
Raquel Urtasun - Uber ATG/University of Toronto
Raquel Urtasun is the Head of Uber ATG Toronto. She is also an Associate Professor in the Department of Computer Science at the University of Toronto, a Canada Research Chair in Machine Learning and Computer Vision and a co-founder of the Vector Institute for AI. Prior to this, she was an Assistant Professor at TTI Chicago. She received her Ph.D. degree from the Ecole Polytechnique Federal de Lausanne (EPFL) in 2006 and did her postdoc at MIT and UC Berkeley. She is a world leading expert in machine perception for self-driving cars. Her research interests include machine learning, computer vision, robotics and remote sensing. Her lab was selected as an NVIDIA NVAIL lab. She is a recipient of an NSERC EWR Steacie Award, an NVIDIA Pioneers of AI Award, a Ministry of Education and Innovation Early Researcher Award, three Google Faculty Research Awards, an Amazon Faculty Research Award, a Connaught New Researcher Award and a Best Paper Runner up Prize awarded at the Conference on Computer Vision and Pattern Recognition (CVPR). She is also Program Chair of CVPR 2018, an Editor of the International Journal in Computer Vision (IJCV) and has served as Area Chair of multiple machine learning and vision conferences.




Ira Kemelmacher-Shlizerman - Assistant Professor/Researcher - Allen School of Computer Science/Facebook
Learning Lip Sync from Audio
Ira Kemelmacher-Shlizerman - Allen School of Computer Science/Facebook
Learning Lip Sync from Audio
Modeling and understanding human beings is pivotal to numerous applications ranging from 3D modeling for telepresence in virtual reality, films, summarizing and visualizing big photo collections, autonomous driving, and recognizing and searching for missing people to name a few. While typically done in a laboratory setting and lots of manual interaction, we’ve been pioneering human modeling "in the wild", by leveraging casual photos and videos that were already captured or easy to capture with commodity cameras. I will take you on a journey of attempting to achieve that, showing our latest technical progress as well as applications we've developed in the process, mostly focusing on our recent work of synthesizing realistic video of a person talking from audio.
Ira Kemelmacher-Shlizerman is an Assistant Professor at the Allen School of Computer Science and Research Scientist at Facebook. She received her Ph.D in computer science and applied mathematics at the Weizmann Institute of Science. Ira works in computer vision, graphics, and learning particularly focusing on modeling people, and virtual and augmented reality. She received the Google faculty award, her work “Moving Portraits” was selected to the cover of the Communications of the ACM, Research Highlights, and tech transferred to Google. Her work “Illumination aware age progression” and its application to missing children search was featured by interviews on national TV, e.g., CBS, NBC, and many others. Ira's 3D face reconstruction from Internet photos received the Madrona prize, and the "Innovation of the 2016 Year Award" by Geekwire. She founded a startup Dreambit that was acquired by Facebook.




Roland Memisevic - Chief Scientist - Twenty Billion Neurons
Common Sense Video Understanding at TwentyBN
Roland Memisevic - Twenty Billion Neurons
Is solving video the key next breakthrough in computer vision? We’ll discuss the key challenges in applying deep learning techniques to video understanding. This will include approaches to building high quality datasets and annotating data for video is quite different than image understanding. What are the key use cases for video today and tomorrow? How do we address concerns around privacy and fears about “big brother”? Last but not least how does video advance the field of AI more towards general intelligence and common sense understanding of the physical world in machine learning models..
Roland Memisevic received his PhD in Computer Science from the University of Toronto in 2008. He subsequently held positions as research scientist at PNYLab, Princeton, as post-doctoral fellow at the University of Toronto and ETH Zurich, and as junior professor at the University of Frankfurt. In 2012 he joined the MILA deep learning group at the University of Montreal as assistant professor. He has been on leave from his academic position since 2016 to lead the research efforts at Twenty Billion Neurons, a German-Canadian AI startup he co-founded. Roland is Fellow of the Canadian Institute for Advanced Research (CIFAR).



COFFEE


Andrea Lodi - Professor - École Polytechnique de Montréal
Learning (Discrete) Optimization
Andrea Lodi - École Polytechnique de Montréal
Learning (Discrete) Optimization
The interaction between Machine Learning and Mathematical Optimization is currently one of the most popular topics at the intersection of Computer Science and Applied Mathematics. While the role of Continuous Optimization within Machine Learning is well known, and, on the applied side, it is rather easy to name areas in which data-driven Optimization boosted by / paired with Machine Learning algorithms can have a game-changing impact, the relationship and the interaction between Machine Learning and Discrete Optimization is largely unexplored and this project concerns one aspect of it, namely the use of modern Machine Learning techniques within / for Discrete Optimization.
Andrea Lodi received the PhD in System Engineering from the University of Bologna in 2000 and he has been Herman Goldstine Fellow at the IBM TJ Watson, NY in 2005–2006. He has been full professor of Operations Research at DEI, University of Bologna between 2007 and 2015. Since 2015, he is Canada Excellence Research Chair in “Data Science for Real-time Decision Making” at the École Polytechnique de Montréal. His main research interests are in Mixed-Integer Linear and Nonlinear Programming and Data Science and his work has received several recognitions including the IBM and Google faculty awards. He is the co-principal investigator (together with Yoshua Bengio) of the project "Data Serving Canadians: Deep Learning and Optimization for the Knowledge Revolution", recently generously funded by the Canadian Federal Government under the CFREF Programme.

FEW-SHOT LEARNING


Richard Zemel - Co-Founder & Director of Research - Vector Institute
Learning with Little Data
Richard Zemel - Vector Institute
Learning with Little Data
The current successes of deep neural networks have largely come on classification problems, based on datasets containing hundreds of examples from each category. Humans can easily learn new words or classes of visual objects from very few examples. A fundamental question is how to adapt learning systems to accommodate new classes not seen in training, given only a few examples of each of these classes. I will discuss recent advances in this area, and present ongoing work by my group on various aspects of this problem.
Richard Zemel is a Professor of Computer Science at the University of Toronto, and the Research Director at the new Vector Institute for Artificial Intelligence. Prior to that he was on the faculty at the University of Arizona, and a Postdoctoral Fellow at the Salk Institute and at CMU. He received the B.Sc. in History & Science from Harvard, and a Ph.D. in Computer Science from the University of Toronto. His awards and honors include a Young Investigator Award from the ONR and a US Presidential Scholar award. He is a Senior Fellow of the Canadian Institute for Advanced Research, an NVIDIA Pioneer of AI, and a member of the NIPS Advisory Board. His recent research interests include learning with weak labels, models of images and text, and fairness.


REINFORCEMENT LEARNING


Herke van Hoof - Post-Doctoral Fellow - McGill University
Stable Reinforcement Learning from Sensor Data
Herke van Hoof - McGill University
Stable Reinforcement Learning from Sensor Data
Reinforcement learning studies how to optimize sequential decisions. Such decisions are encountered in many physical systems, for example in robotics. Real systems can usually produce only relatively small datasets composed of redundant sensor data that might be hard to interpret. We developed a learning algorithm that yields stable policy updates, even with small datasets, without the need for manually tuned features. Deep learning techniques allow efficient learning in the presence of noise or distractions. Our experiments show that our techniques can learn robotic tasks with visual or tactile input from a small amount of experience.
Herke van Hoof is currently a postdoctoral fellow at McGill University in Montreal, Canada. At McGill, Herke works with Joelle Pineau at the Reasoning and Learning Lab as well as with David Meger and Gregory Dudek at the Mobile Robotics Lab. Before that, he obtained his PhD at TU Darmstadt, Germany under the supervision of Jan Peters. His research interest is in reinforcement learning for autonomous robots in perceptually challenging environments.



LUNCH
PLENARY SESSION


Yoshua Bengio - Full Professor - Université de Montréal
Deep Learning and Cognition
Yoshua Bengio - Université de Montréal
Deep Learning and Cognition
Neural networks and deep learning have been inspired by brains, neuroscience and cognition, from the very beginning, starting with distributed representations, neural computation, and the hierarchy of learned features. More recently, it has been for example with the use of rectifying non-linearities (ReLU) - which enables training deeper networks - as well as the use of soft content-based attention - which allow neural nets to go beyond vectors and to process a variety of data structures and led to a breakthrough in machine translation. Ongoing research is now suggesting that brains may use a process similar to backpropagation for estimating gradients and new inspiration from cognition suggests how to learn deep representations which disentangle the underlying factors of variation, by allowing agents to intervene in their environment and explore how to control some of its elements.
Yoshua Bengio is recognized as one of the world’s leading experts in artificial intelligence (AI) and a pioneer in deep learning. Since 1993, he has been a professor in the Department of Computer Science and Operational Research at the Université de Montréal. Holder of the Canada Research Chair in Statistical Learning Algorithms, he is also the founder and scientific director of Mila, the Quebec Institute of Artificial Intelligence, which is the world’s largest university-based research group in deep learning. His research contributions have been undeniable. In 2018, Yoshua Bengio collected the largest number of new citations in the world for a computer scientist thanks to his many publications. The following year, he earned the prestigious Killam Prize in computer science from the Canada Council for the Arts and was co-winner of the A.M. Turing Prize, which he received jointly with Geoffrey Hinton and Yann LeCun. Concerned about the social impact of AI, he actively contributed to the development of the Montreal Declaration for the Responsible Development of Artificial Intelligence.


Geoffrey Hinton - University of Toronto
Geoffrey Hinton designs machine learning algorithms. His aim is to discover a learning procedure that is efficient at finding complex structure in large, high-dimensional datasets and to show that this is how the brain learns to see. He was one of the researchers who introduced the back-propagation algorithm and the first to use backpropagation for learning word embeddings. His other contributions to neural network research include Boltzmann machines, distributed representations, time-delay neural nets, mixtures of experts, variational learning, products of experts and deep belief nets. His research group in Toronto made major breakthroughs in deep learning that have revolutionized speech recognition and object classification.



Yann LeCun - Director of AI Research - Facebook
How Could Machines Learn as Efficiently as Animals and Humans?
Yann LeCun - Facebook
How Could Machines Learn as Efficiently as Animals and Humans?
Deep learning has caused revolutions in computer perception, natural language understanding, but almost all these successes largely use supervised learning, which requires human-annotated data. For game AI, most systems use reinforcement learning, which requires too many trials to be practical in the real world. But animals and humans seem to learn vast amounts of knowledge about how the world works through mere observation and occasional actions. Good predictive world models are an essential component of intelligent behavior: With them, one can predict outcomes and plan courses of actions. One could argue that good predictive models are the basis of "common sense", allowing us to fill in missing information: predict the future from the past and present, the past from the present, or the state of the world from noisy percepts. I will review some principles and methods for predictive learning, and discuss how they can learn hierarchical representations of the world and deal with uncertainty.
Yann is the Director of AI Research at Facebook since December 2013, and Silver Professor at New York University on a part time basis, mainly affiliated with the NYU Center for Data Science, and the Courant Institute of Mathematical Science. He received the EE Diploma from Ecole Supérieure d’Ingénieurs en Electrotechnique et Electronique (ESIEE Paris), and a PhD in CS from Université Pierre et Marie Curie (Paris). After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories in Holmdel, NJ. He became head of the Image Processing Research Department at AT&T Labs-Research in 1996, and joined NYU as a professor in 2003, after a brief period as a Fellow of the NEC Research Institute in Princeton. He is the co-director of the Neural Computation and Adaptive Perception Program of CIFAR, and co-lead of the Moore-Sloan Data Science Environments for NYU. He received the 2014 IEEE Neural Network Pioneer Award.



COFFEE

Panel of Pioneers
Yoshua Bengio - Université de Montréal
Deep Learning and Cognition
Neural networks and deep learning have been inspired by brains, neuroscience and cognition, from the very beginning, starting with distributed representations, neural computation, and the hierarchy of learned features. More recently, it has been for example with the use of rectifying non-linearities (ReLU) - which enables training deeper networks - as well as the use of soft content-based attention - which allow neural nets to go beyond vectors and to process a variety of data structures and led to a breakthrough in machine translation. Ongoing research is now suggesting that brains may use a process similar to backpropagation for estimating gradients and new inspiration from cognition suggests how to learn deep representations which disentangle the underlying factors of variation, by allowing agents to intervene in their environment and explore how to control some of its elements.
Yoshua Bengio is recognized as one of the world’s leading experts in artificial intelligence (AI) and a pioneer in deep learning. Since 1993, he has been a professor in the Department of Computer Science and Operational Research at the Université de Montréal. Holder of the Canada Research Chair in Statistical Learning Algorithms, he is also the founder and scientific director of Mila, the Quebec Institute of Artificial Intelligence, which is the world’s largest university-based research group in deep learning. His research contributions have been undeniable. In 2018, Yoshua Bengio collected the largest number of new citations in the world for a computer scientist thanks to his many publications. The following year, he earned the prestigious Killam Prize in computer science from the Canada Council for the Arts and was co-winner of the A.M. Turing Prize, which he received jointly with Geoffrey Hinton and Yann LeCun. Concerned about the social impact of AI, he actively contributed to the development of the Montreal Declaration for the Responsible Development of Artificial Intelligence.


Geoffrey Hinton - University of Toronto
Geoffrey Hinton designs machine learning algorithms. His aim is to discover a learning procedure that is efficient at finding complex structure in large, high-dimensional datasets and to show that this is how the brain learns to see. He was one of the researchers who introduced the back-propagation algorithm and the first to use backpropagation for learning word embeddings. His other contributions to neural network research include Boltzmann machines, distributed representations, time-delay neural nets, mixtures of experts, variational learning, products of experts and deep belief nets. His research group in Toronto made major breakthroughs in deep learning that have revolutionized speech recognition and object classification.

Yann LeCun - Facebook
How Could Machines Learn as Efficiently as Animals and Humans?
Deep learning has caused revolutions in computer perception, natural language understanding, but almost all these successes largely use supervised learning, which requires human-annotated data. For game AI, most systems use reinforcement learning, which requires too many trials to be practical in the real world. But animals and humans seem to learn vast amounts of knowledge about how the world works through mere observation and occasional actions. Good predictive world models are an essential component of intelligent behavior: With them, one can predict outcomes and plan courses of actions. One could argue that good predictive models are the basis of "common sense", allowing us to fill in missing information: predict the future from the past and present, the past from the present, or the state of the world from noisy percepts. I will review some principles and methods for predictive learning, and discuss how they can learn hierarchical representations of the world and deal with uncertainty.
Yann is the Director of AI Research at Facebook since December 2013, and Silver Professor at New York University on a part time basis, mainly affiliated with the NYU Center for Data Science, and the Courant Institute of Mathematical Science. He received the EE Diploma from Ecole Supérieure d’Ingénieurs en Electrotechnique et Electronique (ESIEE Paris), and a PhD in CS from Université Pierre et Marie Curie (Paris). After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories in Holmdel, NJ. He became head of the Image Processing Research Department at AT&T Labs-Research in 1996, and joined NYU as a professor in 2003, after a brief period as a Fellow of the NEC Research Institute in Princeton. He is the co-director of the Neural Computation and Adaptive Perception Program of CIFAR, and co-lead of the Moore-Sloan Data Science Environments for NYU. He received the 2014 IEEE Neural Network Pioneer Award.


Joelle Pineau - McGill University
Joelle Pineau is an Associate Professor and William Dawson Scholar at McGill University where she co-directs the Reasoning and Learning Lab. Dr. Pineau’s research focuses on developing new models and algorithms for planning and learning in complex partially-observable domains. She also works on applying these algorithms to complex problems in robotics, health care, games and conversational agents. She serves on the editorial board of the Journal of Artificial Intelligence Research and the Journal of Machine Learning Research and is currently President-Elect of the International Machine Learning Society. She is a Senior Fellow of the Canadian Institute for Advanced Research and in 2016 was named a member of the College of New Scholars, Artists and Scientists by the Royal Society of Canada.



Thales Group Special Announcement

Conversation & Drinks

LIGHT BREAKFAST
Lisha Li - Amplify Partners
Lisha is a principal at Amplify Partners. She focuses on companies that leverage machine learning and data to solve problems and she is excited to be investing at a time when algorithmic and data-driven methods have such incredible potential for impact. Lisha completed her PhD at UC Berkeley focusing on deep learning and probability applied to the problem of clustering in graphs. Supported by the NSERC CGS fellowship, she worked with Prof. David Aldous and Prof. Joan Bruna. While at Berkeley she also did statistical consulting, advising on methods and analysis for experimentation and interpretation, and interned as a data scientist at Pinterest and Stitch Fix. She was the lecturer of discrete mathematics, as well as the graduate instructor for probability and statistics and intro CS theory.


CONVOLUTIONAL NEURAL NETWORKS


Jean-François Lalonde - Assistant Professor - Université Laval
Deep Learning for Computer Graphics: Learning to Estimate Lighting From Photographs
Jean-François Lalonde - Université Laval
Deep Learning for Computer Graphics: Learning to Estimate Lighting From Photographs
We propose an automatic method to infer high dynamic range illumination from a single, limited field-of-view, low dynamic range photograph of a scene. In contrast to previous work that relies on specialized image capture, user input, and/or simple scene models, we train an end-to-end deep neural network that directly regresses a limited field-of-view photo to HDR illumination, without strong assumptions on scene geometry, material properties, or lighting. This allows us to automatically recover high-quality HDR illumination estimates that significantly outperform previous state-of-the-art methods. Consequently, using our illumination estimates for applications like 3D object insertion, we can achieve compelling, photorealistic results.
Jean-François Lalonde is an Assistant Professor in Electrical and Computer Engineering at Laval University, Quebec City, since 2013. Previously, he was a Post-Doctoral Associate at Disney Research, Pittsburgh. He received a B.Eng. degree in Computer Engineering with honors from Laval University, Canada, in 2004. He earned his M.S at the Robotics Institute at Carnegie Mellon University in 2006 under Prof. Martial Hebert and received his Ph.D., also from Carnegie Mellon, in 2011 under the supervision of Profs. Alexei A. Efros and Srinivasa G. Narasimhan. His Ph.D. thesis won the 2010-11 CMU School of Computer Science Distinguished Dissertation Award. After graduation, he became a Computer Vision Scientist at Tandent, Inc., where he helped develop LightBrush™, the first commercial intrinsic imaging application. He also introduced intrinsic videos at SIGGRAPH 2012 while at Tandent. His research focuses on lighting-aware image understanding and synthesis by leveraging deep learning and large amounts of data.


Jasper Snoek - Google Brain
Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
Modern machine learning methods including deep learning have achieved great success in predictive accuracy for supervised learning tasks, but may still fall short in giving useful estimates of their predictive uncertainty. Quantifying uncertainty is especially critical in real-world settings, which often involve input distributions that are shifted from the training distribution due to a variety of factors including sample bias and non-stationarity. In such settings, well calibrated uncertainty estimates convey information about when a model's output should (or should not) be trusted. Many probabilistic deep learning methods, including Bayesian-and non-Bayesian methods, have been proposed in the literature for quantifying predictive uncertainty, but to our knowledge there has not previously been a rigorous large-scale empirical comparison of these methods under dataset shift. We present a large-scale benchmark of existing state-of-the-art methods on classification problems and investigate the effect of dataset shift on accuracy and calibration. We find that traditional post-hoc calibration does indeed fall short, as do several other previous methods. However, some methods that marginalize over models give surprisingly strong results across a broad spectrum of tasks.
Jasper Snoek completed his PhD in machine learning at the University of Toronto in 2013. He subsequently held postdoctoral fellowships at the University of Toronto, under Geoffrey Hinton and Ruslan Salakhutdinov, and at the Harvard Center for Research on Computation and Society, under Ryan Adams. Jasper co-founded the machine learning startup Whetlab, which was acquired by Twitter in 2015. Currently, he is a research scientist at Google Brain in Cambridge, MA.



Eric Humphrey - Research Scientist - Spotify
Advances in Deep Architectures and Methods for Separating Vocals in Recorded Music
Eric Humphrey - Spotify
Advances in Deep Architectures and Methods for Separating Vocals in Recorded Music
Source separation of audio mixtures, with an emphasis on the human voice, remains one of the enticing unsolved challenges in audio signal processing. This challenge is amplified in the context of recorded music, where often many sound sources are intentionally correlated in both time and frequency. In this talk, we present recent advances in the state of the art for separating singing voice and accompaniment in popular music audio recordings, leveraging semi-supervised datasets mined from a large commercial music catalog. In addition, we explore the effects of combining deep convolutional U-Net architectures with multi-task learning for vocal separation.
Eric J. Humphrey is a research scientist at Spotify, and acting Secretary on the board of the International Society for Music Information Retrieval (ISMIR). Previously, he has worked or consulted in a research capacity for various companies, notably THX and MuseAmi, and is a contributing organizer of a monthly Music Hackathon series in NYC. He earned his Ph.D. at New York University in Steinhardt's Music Technology Department under the direction of Juan Pablo Bello, Yann LeCun, and Panayotis Mavromatis, exploring the application of deep learning to the domains of audio signal processing and music informatics. When not trying to help machines understand music, you can find him running the streets of Brooklyn or hiding out in his music studio.



COFFEE
DEEP LEARNING FRAMEWORKS


Hugo Larochelle - Director - Google Brain Montreal
Generalizing From Few Examples With Meta-Learning
Hugo Larochelle - Google Brain Montreal
Few-Shot Learning: Thoughts On Where We Should Be Going
Few-shot learning is the problem of learning new tasks from little amounts of labeled data. This is achieved by performing a form of transfer learning, from the data of many other existing tasks. This topic has gained tremendous interest in the past few years, with several new methods being proposed each month. In this talk, I suggest we take a step back, look at what we have achieved and, most importantly, consider where this research should be going next.
Hugo Larochelle is Research Scientist at Google and Assistant Professor at the Université de Sherbrooke (UdeS). Before, he was working with Twitter and he also spent two years in the machine learning group at University of Toronto, as a postdoctoral fellow under the supervision of Geoffrey Hinton. He obtained his Ph.D. at Université de Montréal, under the supervision of Yoshua Bengio. He is the recipient of two Google Faculty Awards. His professional involvement includes associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), member of the editorial board of the Journal of Artificial Intelligence Research (JAIR) and program chair for the International Conference on Learning Representations (ICLR) of 2015 and 2016.




Kyunghyun Cho - Assistant Professor of Computer Science and Data Science - New York University
Deep Learning, Where Are You Going?
Kyunghyun Cho - New York University
Deep Learning, Where Are You Going?
There are three axes along which advances in machine learning and deep learning happen. They are (1) network architectures, (2) learning algorithms and (3) spatio-temporal abstraction. In this talk, I will describe a set of research topics I've pursued in each of these axes. For network architectures, I will describe how recurrent neural networks, which were largely forgotten during 90s and early 2000s, have evolved over time and have finally become a de facto standard in machine translation. I continue on to discussing various learning paradigms, how they related to each other, and how they are combined in order to build a strong learning system. Along this line, I briefly discuss my latest research on designing a query-efficient imitation learning algorithm for autonomous driving. Lastly, I present my view on what it means to be a higher-level learning system. Under this view each and every end-to-end trainable neural network serves as a module, regardless of how they were trained, and interacts with each other in order to solve a higher-level task. I will describe my latest research on trainable decoding algorithm as a first step toward building such a framework.
Kyunghyun Cho is an assistant professor of computer science and data science at New York University. He was a postdoctoral fellow at University of Montreal until summer 2015, and received PhD and MSc degrees from Aalto University early 2014. He tries best to find a balance among machine learning, natural language processing and life, but often fails to do so.


SPEECH RECOGNITION
Alex Acero - Apple
Deep Learning in Speech Recognition
While neural networks had been used in speech recognition in the early 1990s, they did not outperform the traditional machine learning approaches until 2010, when Alex’s team members at Microsoft Research demonstrated the superiority of Deep Neural Networks (DNN) for large vocabulary speech recognition systems. The speech community rapidly adopted deep learning, followed by the image processing, and many other disciplines. In this talk I will explain the transition to deep learning, what the speech recognition field has accomplished, and the remaining challenges.
Alex Acero (PhD, Carnegie Mellon, 1990) is Sr. Director in the Siri team in charge of speech recognition, speech synthesis, and machine translation. Prior to joining Apple, he spent 20 years at Microsoft Research managing teams in speech, audio, multimedia, computer vision, natural language processing, machine translation, machine learning, and information retrieval. Dr. Acero is an IEEE Fellow and ISCA Fellow. Alex has served as President of the IEEE Signal Processing Society and is currently a member of the IEEE Board of Directors. He is the author of the textbook “Spoken Language Processing”. Dr. Acero has published over 250 technical papers and has over 150 US patents.


Greg Diamos - Baidu
The Next Generation of AI Chips
Deep learning has fuelled significant progress in computer vision, speech recognition, and natural language processing. We have seen a single deep learning algorithm learn to recognize two vastly different languages, English and Mandarin, begin to synthesize realistic human speech, recognize visual data, and even to understand human language. At Baidu, we think that this is just the beginning, and high performance computing is poised to help. It turns out that deep learning is compute limited, even on the fastest machines that we can build. This talk will focus on a new discovery that significantly accelerates deep learning training by using mixed 16-bit and 32-bit IEEE standard floating point arithmetic. Unlike previous work in this area, the first generation of commodity hardware realizing an up to 8x speedup from this approach is already shipping in volume. We demonstrate the success of this approach across 15 state of the art deep learning training applications drawn from a diverse set of problem domains, and detail the small changes to deep learning frameworks needed to support this technology.
Greg Diamos leads computer systems research at Baidu’s Silicon Valley AI Lab (SVAIL), where he helped develop the Deep Speech and Deep Voice systems. Before Baidu, Greg contributed to the design of compiler and microarchitecture technologies used in the Volta GPU at NVIDIA. Greg holds a PhD from the Georgia Institute of Technology, where he led the development of the GPU-Ocelot dynamic compiler, which targeted CPUs and GPUs from the same program representation.


VISUAL CLASSIFICATION


Sanja Fidler - Assistant Professor - University of Toronto
Towards Perceptual Machines That See, Converse, and Reason
Sanja Fidler - University of Toronto
Towards Perceptual Machines That See, Converse, and Reason
A successful autonomous system needs to not only understand the visual world but also communicate its understanding with humans. To make this possible, language can serve as a natural link between high level semantic concepts and low level visual perception. In this talk, I'll discuss recent work in the domain of vision and language, covering topics such as image/video captioning and retrieval, and question-answering. I’ll also talk about our recent work on task execution via language instructions.
Sanja Fidler is an Assistant Professor at the Department of Computer Science, University of Toronto. Previously she was a Research Assistant Professor at TTI-Chicago, a philanthropically endowed academic institute located in the campus of the University of Chicago. She completed her PhD in computer science at the University of Ljubljana in 2010, and was a postdoctoral fellow at University of Toronto during 2011-2012. She has served in program committees of numerous international conferences, and has received three outstanding reviewer awards. Together with Rich Zemel and Raquel Urtasun, she received the NVIDIA Pioneer of AI award. Her main research interests are object detection, 3D scene understanding, and the intersection of language and vision.



LUNCH


Bryan Russell - Research Scientist - Adobe Research
Learning from Video: Recognizing Actions and Localizing Moments with Natural Language
Bryan Russell - Adobe Research
Learning from Video: Recognizing Actions and Localizing Moments with Natural Language
This talk will describe two works in video understanding. In the first part, I will describe ActionVLAD, a new video representation for action classification that aggregates local convolutional features across an entire spatio-temporal extent of a video. In the second part, I will describe an approach that retrieves a specific temporal segment (moment) from a video given a natural language text description. We address lack of video datasets for this task by collecting the Distinct Describable Moments (DiDeMo) dataset which consists of over 10,000 unedited, personal videos in diverse visual settings with pairs of localized video segments and referring expressions.
Bryan Russell is currently a Research Scientist at Adobe Research in San Francisco, CA. He received his Ph.D. from MIT in the Computer Science and Artificial Intelligence Laboratory and was a post-doctoral fellow in the INRIA Willow team in Paris, France. He was a Research Scientist with Intel Labs as part of the Intel Science and Technology Center for Visual Computing (ISTC-VC) and was Affiliate Faculty at the University of Washington.



Jianchao Yang - Lead Research Scientist - Snap Inc
Unsupervised Domain Adaptation with Adversarial Network
Jianchao Yang - Snap Inc
Unsupervised Domain Adaptation with Adversarial Network
This talk presents our recent work on using Adversarial learning to improve the recognition task in the presence of domain shifts or bias between the source (training) and target (testing) domains. Our new approach factors the learned feature space of both source domains and target domains into discriminative space and reconstructive space, where the discriminative space captures class-specific information while the reconstructive space captures the domain-specific information. A GAN-loss is used to minimize the domain shifts in the discriminative space between the two domains for better generalization performance. Our preliminary results show the promise of our approach, achieving new state-of-the-art results on standard cross-domain digit classification tasks.
Jianchao Yang is currently a Lead Research Scientist at Snap Inc. Before joining Snap, he was a Research Scientist at Adobe Research. He received his M.S. and Ph.D. degrees both from the ECE Department of University of Illinois at Urbana-Champaign, under supervision of Prof. Thomas Huang. His research focuses on computer vision, deep learning, and image and video processing. He has published more than 80 technical papers on top tier conferences and journals, with Google scholar citation more than 13,000 times. He received the Best Student Paper award from ICCV 2010, the winner prize of the classification task in PASCAL VOC 2009, first position for object localization using external data for ILSVRC ImageNet 2014, and third place in WebVision Challenge 2017. He serves as workshop chair for ACM MM 2017.


NEURAL NETWORK MODELS & ARCHITECTURES


David Duvenaud - Assistant Professor - University of Toronto
Composing Graphical Models With Neural Networks for Structured Representations and Fast Inference
David Duvenaud - University of Toronto
Composing Graphical Models With Neural Networks for Structured Representations and Fast Inference
How can we build structured, but flexible models? We propose a general modeling and inference framework that combines the complementary strengths of probabilistic graphical models and deep learning methods. Our model family combines latent graphical models with neural network observation models. All components are trained simultaneously with a single scalable stochastic variational objective. We illustrate this framework with several example models, and by showing how to automatically segment and categorize mouse behavior from raw video.
David Duvenaud is an assistant professor in computer science and statistics at the University of Toronto. His postdoc was at Harvard University, where he worked on hyperparameter optimization, variational inference, deep learning, and automatic chemical design. He did his Ph.D. at the University of Cambridge, studying Bayesian nonparametrics with Zoubin Ghahramani and Carl Rasmussen. David spent two summers in the machine vision team at Google Research, and also co-founded Invenia, an energy forecasting and trading company.


Joan Bruna - New York University
Divide and Conquer Networks
Many engineering and scientific tasks require solving algorithmic tasks at scale under computational constraints. In general, these constraints are hard to optimize in closed form, motivating the use of data-driven algorithmic learning. Rather than studying this as a black-box discrete regression problem with no assumption whatsoever on the input-output mapping, we concentrate on tasks that are amenable to the principle of divide and conquer, and study what are its implications in terms of learning. This principle creates a powerful inductive bias that can be leveraged with neural architectures that are defined recursively and dynamically, by learning two scale-invariant atomic operations: how to split a given input into smaller sets, and how to merge two partially solved tasks into a larger partial solution. Thanks to the dynamic aspect of such architecture, computational complexity can be incorporated as a regularization term that can be optimized by backpropagation. In this talk, we will illustrate the flexibility and efficiency of the Divide-and-Conquer Network on combinatorial and geometric tasks, such as sorting, clustering and convex hulls.
Joan Bruna graduated from Universitat Politecnica de Catalunya (Barcelona, Spain) in both Mathematics and Electrical Engineering. He obtained an M.Sc. in applied mathematics from ENS Cachan (France). He then became a research engineer in an image processing startup, developing real-time video processing algorithms. He obtained his PhD in Applied Mathematics at Ecole Polytechnique (France), under the supervision of Prof. Stephane Mallat. He was a postdoctoral researcher at the Courant Institute, NYU, and a postdoctoral fellow at Facebook AI Research. In 2015, he became Assistant Professor at UC Berkeley, Statistics Department, and starting Fall 2016 he joined the Courant Institute (NYU, New York) as Assistant Professor in Computer Science, Data Science and Mathematics (affiliated). His research interests include invariant signal representations, high-dimensional statistics and stochastic processes, deep learning and its applications to signal processing. He is co-chair of the IPAM Workshop on New Deep Learning Techniques (2018).



END OF SUMMIT

WOMEN IN MACHINE INTELLIGENCE DINNER - - Le Richmond Marche Italien
Great Food & Networking - 6.30-10pm (Pre-registration needed)
WOMEN IN MACHINE INTELLIGENCE DINNER - Le Richmond Marche Italien
Join us for an evening of discussions & networking around the progress and application of machine intelligence from leading females working in the space following Day 2 of the Deep Learning Summit. The dinner is open to all and provides a unique opportunity to network with peers and make new contacts with fellow diners including Directors, CEOs, Data Scientists, Founders, Researchers and Engineers.
Enjoy a Champagne reception, followed by 3 courses of the finest cuisine and wine to match and compliment the evening and hear short talks from experts following each course.
To register for the event visit the website here or contact Katie on [email protected] for further enquiries.

Shyam Thyagaraj - Accenture
Stepping Stones in an AI Strategy for the Workplace

Adel el-Hallak - IBM
Journey to the Cognitive Era with IBM - How to leverage Deep Learning to win the AI Arms Race

Michael Gschwind - IBM
Journey to the Cognitive Era with IBM - How to leverage Deep Learning to win the AI Arms Race