Schedule

08:30

WELCOME

08:50

REGISTRATION

09:10

Roberto Pieraccini

Roberto Pieraccini, Jibo

Spoken Language Technology: are Computers Finally Learning to Understand us?

The making of Jibo, the first social robot for the home

Jibo is a robot that understands speech, has a moving body that helps him communicate more effectively, and express emotions. Jibo has cameras and microphones to make sense of the world around him, including detecting where sounds come from and recognizing and tracking people. He has a display to show images, an eye that can morph into shapes at will, and a touch interface as a complementary input modality. Jibo encompasses the ultimate human-machine interface, with the potential of becoming a new interaction paradigm. In this talk I will take the audience through a journey in making such a complex devices and the challenged faced by the sheer complexity of integrating a large number of technologies in a character robot device for the home.

Roberto Pieraccini, a scientist, technologist, and the author of “The Voice in the Machine,” (MIT Press, 2012) has been at the forefront of speech, language, and machine learning innovation for more than 30 years. He is widely known as a pioneer in the fields of statistical natural language understanding and machine learning for automatic dialog systems, and their practical application to industrial solutions. As a researcher he worked at CSELT (Italy), Bell laboratories, AT&T Labs, and IBM T.J. Watson. He led the dialog technology team at SpeechWorks Int.l, he was the CTO of SpeechCycle, and the CEO of the International Computer Science Institute (ICSI) in Berkeley. He now leads the Advanced Conversational Technologies team at Jibo. http://robertopieraccini.com

Buttontwitter Buttonlinkedin

09:30

Pierre Garrigues

Pierre Garrigues, Flickr

How Deep Learning Powers Flickr

Deep Learning With Flickr Tags

The members of the Flickr community manually tag photos with the goal of making them searchable and discoverable. With the advent of mobile phone cameras and auto-uploaders, photo uploads have become more numerous and asynchronous, and manual tagging is cumbersome for most users. However, using recent advances in deep learning we now can accurately and automatically identify the content of photos. Progress has been largely driven by training deep neural networks (DNNs) on datasets such as Imagenet that were built using manual annotators. In this talk, we show how it is possible to train DNNs using the user-generated tags directly. Although they are not always accurate, they have the advantage of being plentiful and allow us to train DNNs using an order of magnitude more data than previously done. Furthermore, they capture how the Flickr community tags their photos which is what an automated system should emulate. Training DNNs using hundreds of millions of user tags requires new tools. We also describe Caffe-On-Spark, our infrastructure for large scale distributed deep learning on Hadoop clusters.

Pierre Garrigues is a researcher in machine perception and learning. As a graduate student in the Redwood center for theoretical neuroscience at UC Berkeley, he developed computational models of human visual processing. He then applied the technology from his research to practical applications at IQ Engines, a Berkeley startup providing an image recognition platform that powered mobile visual search as well as the organization of large photo collections. He is currently a research engineer at Flickr. He holds a PhD from the department of Electrical Engineering and Computer Sciences at the University of California, Berkeley, and an undergraduate degree from the Ecole Polytechnique in France.

Buttontwitter Buttonlinkedin

09:50

Simon Osindero

Simon Osindero, Flickr

Pixels, Parks, Birds, and Beauty

Pixels, Parks, Birds, and Beauty

Photography is now a ubiquitous means of expression and communication: with the rise of cell-phone cameras, everyone is a “photographer”.

The vision and learning group at Flickr is developing a wide range of machine intelligence techniques to allow our users to effortlessly explore the world of beautiful images, and to manage their visual communications and memories.

In this talk, I’ll discuss some of the ways we’re using deep learning to power image search and discovery at Flickr. Focussing on how we help you search for images that haven’t been human-labelled, and how we can automatically infer image aesthetics.

Dr Simon Osindero is a pioneer in the field of machine learning and was the co-inventor of deep belief networks whilst researching as a post-doctoral fellow in the Hinton Group at the University of Toronto.

In his current role as an A.I. Architect at Yahoo, he leads computer vision and machine learning R&D at Flickr. He joined Yahoo in 2013 after it acquired LookFlow, a company he cofounded in 2009 to productize cutting edge research from the fields of machine learning and human-computer interaction. Prior to starting LookFlow he worked with a Montreal-based start-up, Idilia, designing machine-learning algorithms for natural language processing.

He holds a MSc in physics, and a BA/MA in natural sciences (physics, molecular biology, mathematics) from Cambridge University (1st Class), and a PhD in Computational Neuroscience from the Gatsby Unit, University College London. He has also worked as a visual and new-media artist, and holds degrees in Photography and Digital Design from Concordia University.

Buttonlinkedin

10:10

Vivienne Ming

Vivienne Ming, Socos

Unleashing Human Potential

The elusive quest to identify and place skilled professionals has become an obsession in the talent wars of the tech industry (not to mention in schools from K though Postdoc). We will discuss the concept of continuous passive predictive (formative) assessment, applied to both learners and professionals, from kindergärtners to (future) CEOs. Building cognitive models using unstructured data and ubiquitous sensors allows the assessment not only of concept mastery, but meta-learning development as well (e.g., "Grit" and "Social-Emotional Intelligence"). Such models can then be used to predict which content will be an effective learning experience for a given learner. In massive courses, from large college lectures to MOOCs, the models can identify ad hoc cohorts for collaborative learning.

Dr. Vivienne Ming is a theoretical neuroscientist, technologist and entrepreneur. She is the co-founder and Managing Partner of Socos, a cutting-edge EdTech company which applies cognitive modeling to align education with life outcomes. Previously, Dr. Ming was Chief Scientist at Gild, an innovative startup that builds better companies by unleashing human potential in their workforce using machine learning. She is a visiting scholar at UC Berkeley's Redwood Center for Theoretical Neuroscience pursuing her research in cognitive prosthetics. Dr. Ming also explores augmented cognition using technology like Google Glass and has been developing predictive models of diabetes and bipolar disorder. Her work and research has received extensive media attention including the New York Times, NPR, Nature, O Magazine, Forbes, and The Atlantic.

Buttontwitter Buttonlinkedin

10:30

COFFEE

10:50

Charlie Tang

Charlie Tang, Apple

Deep Learning with Structure: How Neural Nets can Leverage Domain-Specific Knowledge in Computer Vision

Deep Reinforcement Learning Advancements and Applications

Recent advances in Deep Reinforcement Learning have captured the imagination of both the AI researchers and the general public. Combining the latest Deep Learning technology with Reinforcement Learning techniques has led to stunning breakthroughs, surpassing human level performances at Atari games and the game of Go. Furthermore, Deep RL is being successfully adopted in a variety of fields such as robotics, control systems, translation, dialogue systems, and others. This talk will explore the intuitions, algorithms, and theories that have led to the recent success of Deep RL. A survey of exciting Deep RL applications and tough challenges ahead will also be discussed.

Charlie obtained his PhD in 2015 in Machine Learning from the University of Toronto, advised by Geoffrey Hinton and Ruslan Salakhutdinov. His thesis focused on various aspects of Deep Learning technology. Charlie also holds a Bachelors in Mechatronics Engineering and Masters in Computer Science from the University of Waterloo. After his PhD, along with Ruslan Salakhutdinov and Nitish Srivastava, Charlie co-founded a startup focused on the application of Deep Learning based vision algorithms. Currently, Charlie is a research scientist at Apple Inc. Charlie's research interests include Deep Learning, Vision, Neuroscience and Robotics. He is one of the few competitors to have reached the #1 ranking on Kaggle.com, a widely popular machine learning competition platform. Charlie is also a Canadian national chess master.

Buttontwitter Buttonlinkedin

11:10

Marni Bartlett

Marni Bartlett, Emotient

Learning Natural Facial Expressions

Learning Natural Facial Expressions

Natural facial behavior can reveal information about our internal states intentions. I will describe two recent studies of machine learning on facial expression dynamics, where multi-stage learning models outperformed human observers. The first task was to distinguish genuine from faked pain. The second was to predict when a financial offer would be rejected in an economic game. My colleagues and I began a start-up company, Emotient, in 2012 to make the facial expression software commercially available. The potential for this technology is far-reaching, across fields of healthcare, education, advertising, and retail. I will wrap up the talk by describing applications in these areas.

Marian Bartlett, Ph.D. is co-Founder and Lead Scientist at Emotient, a San Diego based start-up company for automatic facial expression analysis, and Full Research Professor at University of California, San Diego. Marian is a pioneer in the field of machine learning and computer vision for face analysis. She and her colleagues developed software that automatically detects facial expressions of the seven primary emotions, as well as individual facial muscle movements, in collaboration with Paul Ekman, a founder of the science of facial behavior. The potential for this technology is far-reaching, across fields of healthcare, education, advertising, and retail. The technology was awarded best new product from CONNECT, San Diego, in 2013, and Marian was a winner of the 2014 Women Who Mean Business Award from the San Diego Business Journal. Marian received her Ph.D. from University of California, San Diego in Cognitive Science and Psychology, and her B.A. from Middlebury College in Mathematics and Computer Science. She has authored over 80 papers in scientific journals and conference proceedings, as well as a book, Face Image Analysis by Unsupervised Learning, published by Kluwer in 2001.

Buttonlinkedin

11:30

Josh Susskind

Josh Susskind, Emotient

Accurate, Fast & Robust Expression Recognition using Deep Learning

Accurate, Fast & Robust Expression Recognition using Deep Learning

Previous state of the art approaches to facial expression recognition including our own relied on handcrafted feature extraction and computer vision pipelines optimized for runtime speed and accuracy on relatively small datasets. Using specialized deep learning architectures trained on much larger datasets, we have significantly improved accuracy over our previous academic and commercial efforts, even when both types of systems are trained on the same data. These efforts have led to marked improvements in robustness to head pose and expression variation, without incurring speed penalties, and without requiring the use of GPU acceleration.

Dr. Joshua Susskind is a senior data scientist at Emotient, a company focused on real-time perception of facial expressions from images and videos, where he develops algorithms and visualization techniques for understanding human behavior. In graduate school he developed the first deep nets that could recognize and generate facial expressions. He holds a PhD in Psychology and Machine Learning from the University of Toronto, where he was co-advised by Dr. Geoffrey Hinton and Dr. Adam Anderson. His academic work has been featured in high impact journals including Nature Neuroscience and Science and in top computer vision and machine learning conferences.

Buttonlinkedin

11:50

Quoc Le

Quoc Le, Google

Deep Learning for Language Understanding

I will talk about our recent progress in applying Deep Learning to traditionally hard problems such as Machine Vision and Speech Understanding. This has been achieved using one simple algorithm, without hand-crafted features. I will explain why such algorithm can be directly applied to improve automatic learning and adaptation for smart devices.

Quoc Le is research scientist at Google Brain. At Google, Quoc works on large scale deep learning. He led the team that simulated a neural network which learned the concept of "cat" by watching YouTube videos. His work has made breakthroughs in object recognition, speech recognition and language understanding. Quoc obtained his PhD at Stanford, undergraduate degree with First Class Honors and Distinguished Scholar at the Australian National University, and was a researcher at National ICT Australia, Microsoft Research and Max Planck Institute of Biological Cybernetics.

Buttontwitter Buttonlinkedin

12:10

Greg Corrado

Greg Corrado, Google

Google's Large Scale Deep Neural Networks Project

Greg Corrado is a senior research scientist at Google working in artificial intelligence, computational neuroscience, and scalable machine learning. He has published in fields ranging across behavioral economics, particle physics, systems neuroscience, and deep learning. At Google he has worked for some time on brain inspired computing, and most recently has served as one of the founding members and the co-technical lead of Google's large scale deep neural networks project. Before coming to Google, he worked at IBM Research on the SyNAPSE neuromorphic silicon chip. He did his graduate work in Neuroscience and Computer Science at Stanford University, and his undergraduate in Physics at Princeton University.

Buttonlinkedin

12:30

LUNCH

12:50

Jascha Sohl-Dickstein

Jascha Sohl-Dickstein, Stanford University

Fast Large-Scale Optimization by Unifying Stochastic Gradient & Quasi-Newton Methods

Fast Large-Scale Optimization by Unifying Stochastic Gradient & Quasi-Newton Methods

We present an algorithm for performing minibatch optimization that combines the computational efficiency of stochastic gradient descent (SGD) with the second order curvature information leveraged by quasi-Newton methods. These approaches are unified by maintaining an independent Hessian approximation for each minibatch. Each update step requires only a single minibatch evaluation (as in SGD), and each step is scaled using an approximate inverse Hessian and little to no adjustment of hyperparameters is required (as is typical for quasi-Newton methods). This algorithm is made tractable in memory and computational cost even for high dimensional optimization problems by storing and manipulating the quadratic approximations for each minibatch in a shared, time evolving, low dimensional subspace. Source code is released at http://git.io/SFO .

Jascha is a postdoctoral scientist at Stanford University in Surya Ganguli's group. He received his PhD from UC Berkeley in 2012, working with Bruno Olshausen. Jascha's research interests include machine learning, neuroscience, statistical physics, and dynamical systems. Past projects include developing new methods to fit large scale probabilistic models to data, using large scale probabilistic models to capture functional connectivity in the brain, analyzing multispectral imagery from Mars, using Lie groups to capture transformations in natural video, and developing Hamiltonian Monte Carlo sampling algorithms. You can find more information at http://sohldickstein.com/.

Buttonlinkedin

13:10

Ian Goodfellow

Ian Goodfellow, OpenAI

Maxout All the Things

Generative Adversarial Networks

Generative adversarial networks (GANs) use deep learning to imagine new, previously unseen data, such as images. GANs are based on a game between two players: a generator network that creates images, and a discriminator network that guesses whether images came from the training data or from the generator network. This game resembles the conflict between counterfeiters and the police, with counterfeiters forced to learn to produce realistic fakes. At equilibrium, the generator produces images that come from the same probability distribution as the training data, and the discriminator is unable to tell whether images are real or fake.

Ian Goodfellow is a research scientist at OpenAI. He is the lead author of the MIT Press textbook Deep Learning. In addition to generative models, he also studies security and privacy for machine learning. He has contributed to open source libraries including TensorFlow, Theano, and Pylearn2. He obtained a PhD from University of Montreal in Yoshua Bengio's lab, and an MSc from Stanford University, where he studied deep learning and computer vision with Andrew Ng. He is generally interested in all things deep learning.

Buttontwitter Buttonlinkedin

13:30

COFFEE

13:50

PANEL SESSION: Challenges, Limitations & New Solutions

14:10

Babak Hodjat

Babak Hodjat, Sentient

Challenges of processing Big-Data with Big-Compute

When the compute requirements for a Big-Data problem are large enough, they will have to be provisioned from different sources, which makes the task of getting the data to the processing nodes challenging. We present a massively distributed evolutionary algorithm that runs on a compute grid of hundreds of thousands of CPUs, and is capable of utilizing sub-samples of the data in order to discover patterns and classifications in a variety of big-data problems. We demonstrate some of the powerful fault-tolerant, asynchronous features of the system's federated hub-and-spoke architecture and present a case-study on a biotech application in time-series prediction. We finish off with a discussion on how and where Deep Learning can augment this approach.

Babak Hodjat is co-founder and chief scientist of Sentient Technologies, responsible for the core technology behind the world’s largest distributed artificial intelligence system. Babak is a serial entrepreneur, having started a number of Silicon Valley companies as main inventor and technologist. Prior to co-founding Sentient Technologies, Babak was senior director of engineering at Sybase iAnywhere, where he led mobile solutions engineering. Prior to Sybase, Babak was co-founder, CTO and board member of Dejima Inc., acquired by Sybase in April 2004. Babak is the primary inventor of Dejima's patented, agent-oriented technology applied to intelligent interfaces for mobile and enterprise computing – the technology behind Apple’s Siri. Babak is a published scholar in the fields of Artificial Life, Agent-Oriented Software Engineering, and Distributed Artificial Intelligence, and has 25 granted or pending patents to his name. Babak holds a PhD in Machine Intelligence from Kyushu University, in Fukuoka, Japan.

Buttontwitter Buttonlinkedin

14:30

Conversation & Drinks

14:50

Brian Cheung

Brian Cheung, UC Berkeley

Exploring Deep Space: Discovering Factors of Variation Learned in Deep Networks

The Fovea as an Emergent Property of Visual Attention

Neural attention has been applied successfully to a variety of different applications including natural language processing, vision, and memory. An attractive aspect of these neural models is their ability to extract relevant features from data with minimal feature engineering. We further extend this ability to learning interpretable structural features of the attention window itself. We describe a learnable retinal sampling lattice similar to the retinal ganglion cells present in the primate retina. We explore the emergent properties of this lattice after training and find connections to features found in the physiology. Furthermore, we find conditions where these emergent properties are amplified or eliminated providing clues to their function.

Brian Cheung is a PhD Student at UC Berkeley working with Professor Bruno Olshausen at the Redwood Center for Theoretical Neuroscience. His research interests lie at the intersection between machine learning and neuroscience. Drawing inspiration from these fields, he hopes to create systems which can solve complex vision tasks using attention and memory.

Buttonlinkedin

Patrick Ehlen

Patrick Ehlen, Loop AI Labs

Panel Session

Patrick Ehlen, PhD, is a cognitive scientist and Head of Deep Learning at Loop AI Labs. He specializes in representation learning for semantics, pragmatics, and concept acquisition. He developed natural language and context resolution technologies at AT&T Labs, and worked on methods to extract concepts and topics from ordinary spontaneous conversations among people as part of the DARPA CALO project at CSLI/Stanford. He has produced 45 research publications in the areas of computational semantics, cognitive linguistics, psycholinguistics, word sense disambiguation, and human concept learning. He joined Loop AI Labs to help usher in a new era of cognitive computing services.

Buttontwitter Buttonlinkedin

Roberto Pieraccini

Roberto Pieraccini, Jibo

Panel Session

The making of Jibo, the first social robot for the home

Jibo is a robot that understands speech, has a moving body that helps him communicate more effectively, and express emotions. Jibo has cameras and microphones to make sense of the world around him, including detecting where sounds come from and recognizing and tracking people. He has a display to show images, an eye that can morph into shapes at will, and a touch interface as a complementary input modality. Jibo encompasses the ultimate human-machine interface, with the potential of becoming a new interaction paradigm. In this talk I will take the audience through a journey in making such a complex devices and the challenged faced by the sheer complexity of integrating a large number of technologies in a character robot device for the home.

Roberto Pieraccini, a scientist, technologist, and the author of “The Voice in the Machine,” (MIT Press, 2012) has been at the forefront of speech, language, and machine learning innovation for more than 30 years. He is widely known as a pioneer in the fields of statistical natural language understanding and machine learning for automatic dialog systems, and their practical application to industrial solutions. As a researcher he worked at CSELT (Italy), Bell laboratories, AT&T Labs, and IBM T.J. Watson. He led the dialog technology team at SpeechWorks Int.l, he was the CTO of SpeechCycle, and the CEO of the International Computer Science Institute (ICSI) in Berkeley. He now leads the Advanced Conversational Technologies team at Jibo. http://robertopieraccini.com

Buttontwitter Buttonlinkedin

15:50

Nicola Montecchio

Nicola Montecchio, Spotify

Deep Listening to Music

Music recommendation systems can be implemented using off-the-shelf collaborative filtering approaches, but such an approach is sub-optimal in that it does not take into account sources of information that are specific to the music domain. In this talk we show how deep learning is being used at Spotify for extracting meaningful information from the audio content in order to provide better recommendations. Use cases include: learning a measure of music similarity based on purely acoustic properties; classifying songs into genres and correcting erroneous metadata through audio-based artist disambiguation.

Nicola Montecchio is a Music Information Retrieval Scientist at Spotify. He got his Ph.D. in Computer Science at the University of Padova, Italy, working on real-time alignment of music and gesture streams for interactive applications. After spending a year as invited researcher at the IRCAM institute in Paris focusing on the interaction between musicians and computers, he joined The Echo Nest / Spotify in 2012 to work on large scale content-based classification, ranking and similarity algorithms applied to acoustic aspects of music.

Buttontwitter Buttonlinkedin

16:10

Fireside Chat with Andrew Ng

Andrew Ng

Andrew Ng, Baidu

Fireside Chat

Dr. Andrew Ng is Chief Scientist at Baidu. He leads Baidu Research, which comprises three interrelated labs: the Silicon Valley AI Lab, the Institute of Deep Learning and the Big Data Lab. The organization brings together global research talent to work on fundamental technologies in areas such as image recognition and image-based search, speech recognition, natural language processing and semantic intelligence. In addition to his role at Baidu, Dr. Ng a faculty member in Stanford University's Computer Science department, and Chairman of Coursera, an online education platform that he co-founded. Dr. Ng is the author or co-author of over 100 published papers in machine learning, robotics and related fields. He holds degrees from Carnegie Mellon University, MIT and the University of California, Berkeley.

Buttontwitter Buttonlinkedin

Derrick Harris

Derrick Harris, GigaOM

Moderator

Derrick has been a technology journalist since 2003 and has been covering cloud computing, big data and other emerging IT trends for Gigaom since 2009. He has written the words “cloud” and “Hadoop” possibly more than any other person on the planet. Derrick lives in Las Vegas and has a law degree from the University of Nevada, Las Vegas. Away from the office, Derrick trains in muay thai and is active in animal welfare issues.

Buttontwitter Buttonlinkedin

Vivienne Ming

Vivienne Ming, Socos

Summit Compère

The elusive quest to identify and place skilled professionals has become an obsession in the talent wars of the tech industry (not to mention in schools from K though Postdoc). We will discuss the concept of continuous passive predictive (formative) assessment, applied to both learners and professionals, from kindergärtners to (future) CEOs. Building cognitive models using unstructured data and ubiquitous sensors allows the assessment not only of concept mastery, but meta-learning development as well (e.g., "Grit" and "Social-Emotional Intelligence"). Such models can then be used to predict which content will be an effective learning experience for a given learner. In massive courses, from large college lectures to MOOCs, the models can identify ad hoc cohorts for collaborative learning.

Dr. Vivienne Ming is a theoretical neuroscientist, technologist and entrepreneur. She is the co-founder and Managing Partner of Socos, a cutting-edge EdTech company which applies cognitive modeling to align education with life outcomes. Previously, Dr. Ming was Chief Scientist at Gild, an innovative startup that builds better companies by unleashing human potential in their workforce using machine learning. She is a visiting scholar at UC Berkeley's Redwood Center for Theoretical Neuroscience pursuing her research in cognitive prosthetics. Dr. Ming also explores augmented cognition using technology like Google Glass and has been developing predictive models of diabetes and bipolar disorder. Her work and research has received extensive media attention including the New York Times, NPR, Nature, O Magazine, Forbes, and The Atlantic.

Buttontwitter Buttonlinkedin

17:30

Yi Li

Yi Li, Orbeus

A Personal AI System of the People, by the People, for the People

Yi is passionate about using technology to create products and services that simplify life, and people enjoy using. Prior to joining Orbeus, Yi worked at IBM Global Technology Services in China, where her primary responsibility was the IBM-Huawei jointly developed Smart Workspace@Mobile solution. Yi holds a Master’s degree in Information Management from the McCallum Graduate School of Business at Bentley University.

Buttontwitter Buttonlinkedin

Alex Tellez

Alex Tellez, 0xdata

Panel Session

Deep Autoencoders & Bordeaux Wine - Using H20's Autoencoder capabilities, we explore the world of wine from Bordeaux, France and try to predict high-quality vintage years AND build a wine recommendation engine using tasting notes from professional critics.

Alex is an applications and community hacker for 0xdata. Alex's research interest is in large scale deep learning approaches utilizing stacked autoencoders and deep belief networks in the food and wine industry. When not deck deep in code, Alex is nose deep in a glass of wine or riding one of his bicycles.

Buttonlinkedin

Nigel Duffy

Nigel Duffy, Sentient Technologies

Challenges of processing Big-Data with Big-Compute

Visual Intent - A New Way to Understand Consumers

Nigel will discuss the significant progress being made in the areas of visual intelligence and deep learning. Purchase decisions are often most strongly influenced by images. Understanding users' visual intent is essential to reducing friction and increasing conversion for those users. This talk will discuss visual learning technologies that aid decision-making by observing, interpreting and evaluating users' interactions with visual content. Nigel will focus on the technology’s ability to extract meaning from images and to make precise predictions about users' preferences. He will discuss applications of this technology and how it will transform the world of retail and e-commerce.

Nigel Duffy leads the research, development, and commercialization of Sentient’s Artificial Intelligence technologies. A recognized expert in machine learning, Nigel was previously the co-founder and CTO at Numerate Inc. where he led technology development and managed the application of Numerate’s platform. Nigel invented Numerate’s core technologies which designed novel drug candidates for diseases. Additionally Nigel was VP of Engineering at Pharmix and worked as a research scientist at AiLive (developer of the Wii Motion Plus), where he applied machine learning to computer games. Nigel also spent time at Amazon A9 working on tools for large scale analytics in product search.

Buttontwitter Buttonlinkedin

Simon Osindero

Simon Osindero, Flickr

Panel Session

Pixels, Parks, Birds, and Beauty

Photography is now a ubiquitous means of expression and communication: with the rise of cell-phone cameras, everyone is a “photographer”.

The vision and learning group at Flickr is developing a wide range of machine intelligence techniques to allow our users to effortlessly explore the world of beautiful images, and to manage their visual communications and memories.

In this talk, I’ll discuss some of the ways we’re using deep learning to power image search and discovery at Flickr. Focussing on how we help you search for images that haven’t been human-labelled, and how we can automatically infer image aesthetics.

Dr Simon Osindero is a pioneer in the field of machine learning and was the co-inventor of deep belief networks whilst researching as a post-doctoral fellow in the Hinton Group at the University of Toronto.

In his current role as an A.I. Architect at Yahoo, he leads computer vision and machine learning R&D at Flickr. He joined Yahoo in 2013 after it acquired LookFlow, a company he cofounded in 2009 to productize cutting edge research from the fields of machine learning and human-computer interaction. Prior to starting LookFlow he worked with a Montreal-based start-up, Idilia, designing machine-learning algorithms for natural language processing.

He holds a MSc in physics, and a BA/MA in natural sciences (physics, molecular biology, mathematics) from Cambridge University (1st Class), and a PhD in Computational Neuroscience from the Gatsby Unit, University College London. He has also worked as a visual and new-media artist, and holds degrees in Photography and Digital Design from Concordia University.

Buttonlinkedin

18:50

COFFEE

08:30

COFFEE

08:50

WELCOME

Visual Object Recognition

09:30

Charles Cadieu

Charles Cadieu, BayLabs

Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition

Increasing Quality, Value and Access to Medical Imaging

How can we bring the life-saving benefits of medical imaging to more people? At BayLabs we are pursuing this mission by combining deep learning and ultrasound. Ultrasound, combined with deep learning, has the potential to transform medicine at the point-of-care. In this talk, I will present our work at MIT’s DiCarlo Lab that showed that this potential future may now be within grasp. I’ll conclude with one of BayLabs’ efforts to bring medical imaging, powered by our deep learning algorithms, to those most in need.

Charles Cadieu is Co-Founder and CEO of BayLabs. BayLabs’ mission is to increase quality, value and access to medical imaging with deep learning and ultrasound. Charles is an entrepreneur and neuroscientist who brings cutting-edge vision algorithms to market and seeks to increase our knowledge of the human visual system. His neuroscience work covers the full spectrum of visual processing, from low-level image representation through high-level object recognition. As an entrepreneur, he has started multiple companies and was an early member at IQ Engines (acquired by Yahoo! and now powering visual search at Flickr). Recently, he co-founded BayLabs to address high-impact problems in healthcare. He holds a BS/MEng from MIT in EECS, a PhD from UC Berkeley in Neuroscience, and is a Research Affiliate at MIT.

Buttontwitter Buttonlinkedin

09:50

Ivan Laptev

Ivan Laptev, INRIA Paris

Weakly Supervised Object Recognition with Convolutional Neural Network

Weakly supervised object recognition with convolutional neural network

Successful methods for visual object recognition typically rely on large image datasets with rich annotation. Detailed image annotation in terms of object bounding boxes or object parts is both expensive and subjective. In this talk I will present a weakly supervised convolutional neural network (ConvNet) that achieves state-of-the-art results without using detailed annotation. In particular, I will show results for object and action recognition in still images where the network learns to recognize and localize objects and human actions without using location supervision at the training time. We show that our weakly-supervised method achieves comparable performance to its strongly-supervised counterpart.

Ivan Laptev is a research director at INRIA Paris, France. He received Habilitation degree from École Normale Supérieure in 2013 and a PhD degree in Computer Science from the Royal Institute of Technology in 2004. Ivan's main research interests include visual recognition of human actions, objects and interactions. He has published over 50 papers at international conferences and journals of computer vision. He serves as an associate editor for IJCV, TPAMI and IVC journals, he was/is an area chair for CVPR'10,'13,'15, ICCV'11, ECCV'12,'14 and ACCV'14, he has co-organized several tutorials, workshops and challenges. He received ERC Junior Grant in 2012.

10:10

COFFEE

10:30

Matthew Zeiler

Matthew Zeiler, Clarifai Inc

Leveraging Multiple Dimensions

Forevery: Deep Learning for Everyone!

Forevery is a free photo discovery app that takes you on a personalized journey through every memory saved on your camera roll. Using deep learning, our app automatically applies relevant tags for 11,000+ objects, ideas, themes, locations, and feelings to each picture so searching for every photo and rediscovering every memory is a snap! In addition to tagging, we build in the ability to teach the app custom concepts like your friends and family or your favorite sports team. Forevery learns what you care about most, auto-generating photo stories that make it easy to share with the people you care about most.

Clarifai was founded by Matt Zeiler, an U of Toronto and NYU alumnus who worked with several pioneers in neural networks, and Adam Berenzweig, who left Google after 10+ years where he worked on Goggles and visual search.

Matthew Zeiler, PhD, Founder and CEO of Clarifai Inc. studied machine learning and image recognition with several pioneers in the field of deep learning at University of Toronto and New York University. His insights into neural networks produced the top 5 results in the 2013 ImageNet classification competition. He founded Clarifai to push the limits of practical machine learning, which will power the next generation of intelligent applications and devices.

Buttontwitter Buttonlinkedin

Bi-Modal Image-Text Data Analysis

11:10

Richard Socher

Richard Socher, Salesforce

Recursive Deep Learning for Modeling Compositional and Grounded Meaning

Multimodal Question Answering for Language and Vision

Deep Learning has made tremendous breakthroughs possible in visual understanding and speech recognition. Ostensibly, this is not the case in natural language processing (NLP) and higher level reasoning. However, it only appears that way because there are so many different tasks in NLP and no single one of them, by itself, captures the complexity of language. I will talk about dynamic memory networks for question answering. This model architecture and task combination can solve a wide variety of visual and NLP problems, including those that require reasoning.

Richard Socher is Chief Scientist at Salesforce. He was previously the CEO and founder of MetaMind, a startup that seeked to improve artificial intelligence and make it widely accessible. He obtained his PhD from Stanford working on deep learning with Chris Manning and Andrew Ng and won the best Stanford CS PhD thesis award. He is interested in developing new AI models that perform well across multiple different tasks in natural language processing and computer vision.

He was awarded the Distinguished Application Paper Award at the International Conference on Machine Learning (ICML) 2011, the 2011 Yahoo! Key Scientific Challenges Award, a Microsoft Research PhD Fellowship in 2012 and a 2013 "Magic Grant" from the Brown Institute for Media Innovation and the 2014 GigaOM Structure Award.

Buttontwitter Buttonlinkedin

11:30

Nitish Srivastava

Nitish Srivastava, University of Toronto

Multimodal Learning with Deep Boltzmann Machines

Multimodal Learning with Deep Boltzmann Machines

Real-world data often consists of multiple modalities, for example, images are often accompanied by captions and tags; videos contain both visual and auditory information; robots receive data from visual, auditory and touch sensors. I will talk about a deep learning model that can extract a unified representation which fuses the multiple modalities together. The model is robust to missing data and can fill in missing modalities based on what is available. Our experiments on bi-modal image-text data show this model can be used to generate words given an image as well as retrieve images given some text.

Nitish Srivastava is a PhD student in the Machine Learning group at the University of Toronto working with Geoffrey Hinton and Russ Salakhutdinov. He is interested in using machine learning to create representations for images and videos that can help solve computer vision. He is working on object detection and action recognition. He is also interested in combining multiple data modalities into joint representations that can be used for cross-modal information retrieval. He has also worked on developing a new regularization technique that makes it possible to train very large and deep neural networks without overfitting.

Buttonlinkedin

11:50

LUNCH

12:10

Peter Sadowski

Peter Sadowski, University of California Irvine

Deep Learning in High-Energy Physics

Deep Learning in High-Energy Physics

The Higgs Boson was observed for the first time in 2011-2012, and ongoing experiments will answer fundamental questions about the universe by characterizing its properties. Machine learning plays a major role in analyzing the petabytes of data produced by these high-energy physics experiments. In this work, we demonstrate that deep learning is particularly well-suited for this application: deep neural networks improve performance compared to shallow learning algorithms, and from raw data they can automatically learn high-level features that usually need to be derived by physicists.

Peter Sadowski is a PhD student at the University of California Irvine, where he studies deep learning and artificial neural networks. He has published work on stochastic algorithms for training neural networks, along with work on deep learning applications in diverse areas such as bioinformatics and high-energy physics. More generally, Peter interested in data-driven solutions to problems of learning, inference, and optimization.

Buttonlinkedin

12:30

END OF SUMMIT

12:50

Venkatesh Ramanathan

Venkatesh Ramanathan, Paypal

Paypal & Deep Learning

Fraud Detection Using Deep Learning

Deep Learning has shown superior performance in the areas of image processing, object recognition and text processing. In this talk, I will present how Deep Learning can help with payment fraud detection. I will present results from experiments conducted on a very large data set containing over 10 million examples and 1000s of features. I will also explore several advanced features such as adaptive learning rate and dropout regularization and their impact on runtime and predictive performance.

Venkatesh is a senior data scientist at PayPal where he is working on building state-of-the-art tools for payment fraud detection. He has over 20+ years experience in designing, developing and leading teams to build scalable server side software. In addition to being an expert in big-data technologies, Venkatesh holds a Ph.D. degree in Computer Science with specialization in Machine Learning and Natural Language Processing (NLP) and had worked on various problems in the areas of Anti-Spam, Phishing Detection, and Face Recognition.

Buttonlinkedin

13:10

Jürgen Sturm

Jürgen Sturm, Metaio

Deep Learning for Virtual Shopping

Deep Learning for Virtual Shopping

Metaio is the world leading augmented reality provider. We enable to virtually try-on glasses, earrings, and even new hair colors, which creates a completely new and immersive shopping experience: Customers can directly check out how a new product would look on them without the need of the real physical product. In my talk, I will give an overview of recent deep learning techniques developed at Metaio for face detection and face tracking. We recorded large datasets for face tracking and alignment, both for normal cameras and depth cameras such as Kinect. We use both convolutional networks and random forests for classification and shape regression. Special care was given to memory and compute optimization, so that our software runs in real-time on mobile devices such as smartphones and tablets. During my talk, I will give several live demos of our technology and how we make use of them to create a value chain together with our customers.

Dr. Jürgen Sturm heads the machine learning efforts at Metaio GmbH, the world-leading Augmented Reality technology provider. He and his team research deep learning techniques such as random forests to track and augment the human body on camera images. The goal of Metaio’s machine learning efforts is to create immersive virtual shopping experiences, for example, to try on sunglasses or earrings. Before he joined Metaio, he was a postdoctoral researcher in the Computer Vision group of Prof. Daniel Cremers at the Technical University of Munich, where he developed several novel methods for real-time camera tracking and 3D person scanning. In 2011, he obtained his PhD from the Autonomous Intelligent Systems lab headed by Prof. Wolfram Burgard at the University of Freiburg. He won several awards for his scientific work including the best dissertation award of the European Coordinating Committee of Artificial Intelligence (ECCAI) in 2011 and the TUM TeachInf best lecture award 2012 and 2013 for his course "Visual Navigation for Flying Robots".

Buttontwitter

13:30

Appu Shaji

Appu Shaji, EyeEm

Deep Learning: Revolutionizing the Search for Amazing Photography!

Recording The Visual Mind: Understanding Aesthetics with Deep Learning

With the rise of mobile cameras, the process of capturing good photos has been democratized - and this overload of content has created a challenge in search. One of the important aspects of photography is that every image communicates with a different audience in different form. This talk will address how we use computer vision techniques at EyeEm measure visual aesthetics in photography -and beyond that- personalize the image search experience to find the photos you personally find beautiful.

Appu is the Head, Research & Development at EyeEm. His first company, sight.io, was acquired by EyeEm in 2014 and also held post-doctoral positions at EPFL working alongside Prof. Sabine Süsstrunk and Prof. Pascal Fua. In 2009, Appu obtained his Ph.D. from IIT Bombay, where he was awarded best thesis from Computer Science Dept. He was also selected as one of the most promising 20 entrepreneurs of Switzerland in 2013. His research has appeared in top computer vision journals and conferences such as TPAMI, CVPR, and ACM Multimedia etc.

Buttontwitter Buttonlinkedin

13:50

Eugenio Culurciello

Eugenio Culurciello, TeraDeep

TeraDeep

Eugenio Culurciello (S'97-M'99) received the Ph.D. degree in Electrical and Computer Engineering in 2004 from the Johns Hopkins University, Baltimore, MD. Dr. Culurciello is TeraDeep founder and leader: http://teradeep.com/. He is also an associate professor of the Department of Electrical and Computer Engineering, Mechanical Engineering, the Weldon School of Biomedical Engineering, and of Psychological Sciences in the College of Health & Human Sciences at Purdue University, where he directs the ‘e-Lab’ laboratory. Eugenio Culurciello was the recipient of The Presidential Early Career Award for Scientists and Engineers (PECASE), the Distinguished Lecturer of the IEEE (CASS), and is the author of the book "Silicon-on-Sapphire Circuits and Systems, Sensor and Biosensor interfaces" published by McGraw Hill in 2009, and "Biomedical Circuits and System, Integrated Instrumentation" published by Lulu in 2013. http://teradeep.com/blog/euge-cv.html

Buttontwitter

Paul Murphy

Paul Murphy, Clarify

Fireside Chat: The Founder & CEO of Clarify

Deep Learning & Speech: Adaptation, the Next Frontier

The speech community is finally excited about deep learning, but we’re proceeding with caution. Adaptation is critical to understanding real-world speech data. We need to adapt to acoustics and language of course, but also to context. To date, DNNs have shown great promise, but their ability to adapt to the unexpected is still in question. This talk will look at where we are today, as well as the challenges still in front of us.

Paul Murphy is one of Clarify's founders and its CEO. Paul's career in software operations industry has spanned twenty years and three continents. Ten years were dedicated to understanding and building large systems on Wall Street for clients like J.P. Morgan and Salomon Brothers. Paul's work in this area allowed him to explore a broad range of computing solutions, from mainframes to web services, and the gamut of space-time tradeoffs required by dissimilar front and back office systems. Thirteen years ago, Paul moved to London to work at Adeptra, a pioneer in the use of automated outbound calling in the area of credit card fraud detection and prevention. As Adeptra's CTO, he developed all of the software which enabled Adeptra to place intelligent interactive outbound calls on behalf of clients. These systems made extensive use of text-to-speech and voice recognition technology. Since then Paul has dedicated his time to developing technologies that leverage emerging voice processing techniques.

Buttontwitter Buttonlinkedin

14:30

Tim Tuttle

Tim Tuttle, MindMeld

The Voice Revolution has Arrived

Voice Assistant Adoption & Attitudes

Intelligent voice interfaces have long been a favorite of science fiction, and recent AI advances are finally making them a reality. Currently most common in smartphone assistants, voice interfaces have also begun appearing in cars, wearables, smart televisions, and connected home devices such as Amazon’s Echo. The performance of such voice interfaces is finally acceptable, users are starting to embrace them.

MindMeld’s presentation will focus on both what’s driving these technical capabilities that are finally making intelligent voice interfaces a reality, and how the demand for such voice capabilities is changing based on findings of research conducted by the company.

These combined trends of a fast changing market has implications for many different devices, applications and industries. The presentation will share data behind these trends and conclusions of what this means for the industry.

Tim is the CEO and Founder of Mindmeld. Tim started his career at the MIT Artificial Intelligence Lab, where he received his PhD. Tim has also served on the research faculty at MIT as well as Bell Laboratories. His first company built the Internet’s first large-scale CDN for real-time data. His second company, Truveo, built the web’s second-largest video search platform, reaching over 70M monthly visitors, and was acquired by AOL. Tim served as Senior Vice President at AOL responsible for the Truveo business unit. Tim is the author of eighteen technical publications, and was selected as one of the 100 Top Young Innovators by MIT Technology Review Magazine.

Buttontwitter Buttonlinkedin

Jordan Novet

Jordan Novet, VentureBeat

Moderator

Jordan Novet is a VentureBeat staff writer based in San Francisco. He writes about big data, cloud computing, and other technology for business. He previously covered those things at Gigaom.

Buttontwitter Buttonlinkedin

Vivienne Ming

Vivienne Ming, Socos

Summit Compère

The elusive quest to identify and place skilled professionals has become an obsession in the talent wars of the tech industry (not to mention in schools from K though Postdoc). We will discuss the concept of continuous passive predictive (formative) assessment, applied to both learners and professionals, from kindergärtners to (future) CEOs. Building cognitive models using unstructured data and ubiquitous sensors allows the assessment not only of concept mastery, but meta-learning development as well (e.g., "Grit" and "Social-Emotional Intelligence"). Such models can then be used to predict which content will be an effective learning experience for a given learner. In massive courses, from large college lectures to MOOCs, the models can identify ad hoc cohorts for collaborative learning.

Dr. Vivienne Ming is a theoretical neuroscientist, technologist and entrepreneur. She is the co-founder and Managing Partner of Socos, a cutting-edge EdTech company which applies cognitive modeling to align education with life outcomes. Previously, Dr. Ming was Chief Scientist at Gild, an innovative startup that builds better companies by unleashing human potential in their workforce using machine learning. She is a visiting scholar at UC Berkeley's Redwood Center for Theoretical Neuroscience pursuing her research in cognitive prosthetics. Dr. Ming also explores augmented cognition using technology like Google Glass and has been developing predictive models of diabetes and bipolar disorder. Her work and research has received extensive media attention including the New York Times, NPR, Nature, O Magazine, Forbes, and The Atlantic.

Buttontwitter Buttonlinkedin

15:30

Fireside Chat: The Founder & CEO of Clarify

15:50

Alejandro Jaimes

Alejandro Jaimes, Acesio

Learning Creativity

Alejandro (Alex) Jaimes is CTO & Chief Scientist at Acesio. Acesio focuses on Big Data for predictive analytics in Healthcare to tackle disease at worldwide scale, impacting individuals and entire populations. We use Artificial Intelligence to collect and analyze vast quantities of data to track and predict disease in ways that have never been done before- leveraging environmental variables, population movements, sensor data, and the web. Prior to joining Acesio, Alex was CTO at AiCure and prior to that he was Director of Research/Video Product at Yahoo where he led research and contributions to Yahoo's video products, managing teams of scientists and engineers in New York City, Sunnyvale, Bangalore, and Barcelona. His work focuses on Machine Learning, mixing qualitative and quantitative methods to gain insights on user behavior for product innovation. He has published widely in the top-tier conferences (KDD, WWW, RecSys, CVPR, ACM Multimedia, etc), has been a visiting professor (KAIST), and is a frequent speaker at international academic and industry events. He is a scientist and innovator with 15+ years of international experience in research leading to product impact (Yahoo, KAIST, Telefonica, IDIAP-EPFL, Fuji Xerox, IBM, Siemens, and AT&T Bell Labs). He has worked in the USA, Japan, Chile, Switzerland, Spain, and South Korea, and holds a Ph.D. from Columbia University.

Buttontwitter Buttonlinkedin

16:10

Wojciech Zaremba

Wojciech Zaremba, Facebook AI Research

Learning to Manipulate Symbols

Blind Spots in Neural Networks

Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions.

We find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extent. We can cause the network to misclassify an image by applying a certain hardly perceptible perturbation, which is found by maximizing the network’s prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.

Wojciech Zaremba is a PhD student at the New York University, and a scientist at Facebook AI Research. His expertise lies on deep learning, where he has experience in computer vision, and natural language processing problems. He is interested in solving symbolic manipulation tasks, which include reasoning about mathematical formulas, or computer program properties.

Wojciech has worked as a member of Google Brain under supervision of Prof. Geoffrey Hinton, and Ilya Sutskever. He holds a Master's degree summa cum laude from Ecole Polytechnique in Paris. He has also received a silver medal at the International Mathematical Olympiad in 2007.

Buttonlinkedin

Connect

Be Sociable

  • Twitter
  • Facebook
  • Linkedin
  • Youtube
  • Flickr
  • Lanyrd
  • Instagram
  • Google plus
  • Medium