Visual Object Recognition
Charles Cadieu, BayLabs
Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition
Increasing Quality, Value and Access to Medical Imaging
How can we bring the life-saving benefits of medical imaging to more people? At BayLabs we are pursuing this mission by combining deep learning and ultrasound. Ultrasound, combined with deep learning, has the potential to transform medicine at the point-of-care. In this talk, I will present our work at MIT’s DiCarlo Lab that showed that this potential future may now be within grasp. I’ll conclude with one of BayLabs’ efforts to bring medical imaging, powered by our deep learning algorithms, to those most in need.
Charles Cadieu is Co-Founder and CEO of BayLabs. BayLabs’ mission is to increase quality, value and access to medical imaging with deep learning and ultrasound. Charles is an entrepreneur and neuroscientist who brings cutting-edge vision algorithms to market and seeks to increase our knowledge of the human visual system. His neuroscience work covers the full spectrum of visual processing, from low-level image representation through high-level object recognition. As an entrepreneur, he has started multiple companies and was an early member at IQ Engines (acquired by Yahoo! and now powering visual search at Flickr). Recently, he co-founded BayLabs to address high-impact problems in healthcare. He holds a BS/MEng from MIT in EECS, a PhD from UC Berkeley in Neuroscience, and is a Research Affiliate at MIT.
Ivan Laptev, INRIA Paris
Weakly Supervised Object Recognition with Convolutional Neural Network
Weakly supervised object recognition with convolutional neural network
Successful methods for visual object recognition typically rely on large image datasets with rich annotation. Detailed image annotation in terms of object bounding boxes or object parts is both expensive and subjective. In this talk I will present a weakly supervised convolutional neural network (ConvNet) that achieves state-of-the-art results without using detailed annotation. In particular, I will show results for object and action recognition in still images where the network learns to recognize and localize objects and human actions without using location supervision at the training time. We show that our weakly-supervised method achieves comparable performance to its strongly-supervised counterpart.
Ivan Laptev is a research director at INRIA Paris, France. He received Habilitation degree from École Normale Supérieure in 2013 and a PhD degree in Computer Science from the Royal Institute of Technology in 2004. Ivan's main research interests include visual recognition of human actions, objects and interactions. He has published over 50 papers at international conferences and journals of computer vision. He serves as an associate editor for IJCV, TPAMI and IVC journals, he was/is an area chair for CVPR'10,'13,'15, ICCV'11, ECCV'12,'14 and ACCV'14, he has co-organized several tutorials, workshops and challenges. He received ERC Junior Grant in 2012.
Matthew Zeiler, Clarifai Inc
Leveraging Multiple Dimensions
Forevery: Deep Learning for Everyone!
Forevery is a free photo discovery app that takes you on a personalized journey through every memory saved on your camera roll. Using deep learning, our app automatically applies relevant tags for 11,000+ objects, ideas, themes, locations, and feelings to each picture so searching for every photo and rediscovering every memory is a snap! In addition to tagging, we build in the ability to teach the app custom concepts like your friends and family or your favorite sports team. Forevery learns what you care about most, auto-generating photo stories that make it easy to share with the people you care about most.
Clarifai was founded by Matt Zeiler, an U of Toronto and NYU alumnus who worked with several pioneers in neural networks, and Adam Berenzweig, who left Google after 10+ years where he worked on Goggles and visual search.
Matthew Zeiler, PhD, Founder and CEO of Clarifai Inc. studied machine learning and image recognition with several pioneers in the field of deep learning at University of Toronto and New York University. His insights into neural networks produced the top 5 results in the 2013 ImageNet classification competition. He founded Clarifai to push the limits of practical machine learning, which will power the next generation of intelligent applications and devices.
Bi-Modal Image-Text Data Analysis
Richard Socher, Salesforce
Recursive Deep Learning for Modeling Compositional and Grounded Meaning
Multimodal Question Answering for Language and Vision
Deep Learning has made tremendous breakthroughs possible in visual understanding and speech recognition. Ostensibly, this is not the case in natural language processing (NLP) and higher level reasoning. However, it only appears that way because there are so many different tasks in NLP and no single one of them, by itself, captures the complexity of language. I will talk about dynamic memory networks for question answering. This model architecture and task combination can solve a wide variety of visual and NLP problems, including those that require reasoning.
Richard Socher is Chief Scientist at Salesforce. He was previously the CEO and founder of MetaMind, a startup that seeked to improve artificial intelligence and make it widely accessible. He obtained his PhD from Stanford working on deep learning with Chris Manning and Andrew Ng and won the best Stanford CS PhD thesis award. He is interested in developing new AI models that perform well across multiple different tasks in natural language processing and computer vision.
He was awarded the Distinguished Application Paper Award at the International Conference on Machine Learning (ICML) 2011, the 2011 Yahoo! Key Scientific Challenges Award, a Microsoft Research PhD Fellowship in 2012 and a 2013 "Magic Grant" from the Brown Institute for Media Innovation and the 2014 GigaOM Structure Award.
Nitish Srivastava, University of Toronto
Multimodal Learning with Deep Boltzmann Machines
Multimodal Learning with Deep Boltzmann Machines
Real-world data often consists of multiple modalities, for example, images are often accompanied by captions and tags; videos contain both visual and auditory information; robots receive data from visual, auditory and touch sensors. I will talk about a deep learning model that can extract a unified representation which fuses the multiple modalities together. The model is robust to missing data and can fill in missing modalities based on what is available. Our experiments on bi-modal image-text data show this model can be used to generate words given an image as well as retrieve images given some text.
Nitish Srivastava is a PhD student in the Machine Learning group at the University of Toronto working with Geoffrey Hinton and Russ Salakhutdinov. He is interested in using machine learning to create representations for images and videos that can help solve computer vision. He is working on object detection and action recognition. He is also interested in combining multiple data modalities into joint representations that can be used for cross-modal information retrieval. He has also worked on developing a new regularization technique that makes it possible to train very large and deep neural networks without overfitting.
Peter Sadowski, University of California Irvine
Deep Learning in High-Energy Physics
Deep Learning in High-Energy Physics
The Higgs Boson was observed for the first time in 2011-2012, and ongoing experiments will answer fundamental questions about the universe by characterizing its properties. Machine learning plays a major role in analyzing the petabytes of data produced by these high-energy physics experiments. In this work, we demonstrate that deep learning is particularly well-suited for this application: deep neural networks improve performance compared to shallow learning algorithms, and from raw data they can automatically learn high-level features that usually need to be derived by physicists.
Peter Sadowski is a PhD student at the University of California Irvine, where he studies deep learning and artificial neural networks. He has published work on stochastic algorithms for training neural networks, along with work on deep learning applications in diverse areas such as bioinformatics and high-energy physics. More generally, Peter interested in data-driven solutions to problems of learning, inference, and optimization.
END OF SUMMIT
Venkatesh Ramanathan, Paypal
Paypal & Deep Learning
Fraud Detection Using Deep Learning
Deep Learning has shown superior performance in the areas of image processing, object recognition and text processing. In this talk, I will present how Deep Learning can help with payment fraud detection. I will present results from experiments conducted on a very large data set containing over 10 million examples and 1000s of features. I will also explore several advanced features such as adaptive learning rate and dropout regularization and their impact on runtime and predictive performance.
Venkatesh is a senior data scientist at PayPal where he is working on building state-of-the-art tools for payment fraud detection. He has over 20+ years experience in designing, developing and leading teams to build scalable server side software. In addition to being an expert in big-data technologies, Venkatesh holds a Ph.D. degree in Computer Science with specialization in Machine Learning and Natural Language Processing (NLP) and had worked on various problems in the areas of Anti-Spam, Phishing Detection, and Face Recognition.
Jürgen Sturm, Metaio
Deep Learning for Virtual Shopping
Deep Learning for Virtual Shopping
Metaio is the world leading augmented reality provider. We enable to virtually try-on glasses, earrings, and even new hair colors, which creates a completely new and immersive shopping experience: Customers can directly check out how a new product would look on them without the need of the real physical product. In my talk, I will give an overview of recent deep learning techniques developed at Metaio for face detection and face tracking. We recorded large datasets for face tracking and alignment, both for normal cameras and depth cameras such as Kinect. We use both convolutional networks and random forests for classification and shape regression. Special care was given to memory and compute optimization, so that our software runs in real-time on mobile devices such as smartphones and tablets. During my talk, I will give several live demos of our technology and how we make use of them to create a value chain together with our customers.
Dr. Jürgen Sturm heads the machine learning efforts at Metaio GmbH, the world-leading Augmented Reality technology provider. He and his team research deep learning techniques such as random forests to track and augment the human body on camera images. The goal of Metaio’s machine learning efforts is to create immersive virtual shopping experiences, for example, to try on sunglasses or earrings. Before he joined Metaio, he was a postdoctoral researcher in the Computer Vision group of Prof. Daniel Cremers at the Technical University of Munich, where he developed several novel methods for real-time camera tracking and 3D person scanning. In 2011, he obtained his PhD from the Autonomous Intelligent Systems lab headed by Prof. Wolfram Burgard at the University of Freiburg. He won several awards for his scientific work including the best dissertation award of the European Coordinating Committee of Artificial Intelligence (ECCAI) in 2011 and the TUM TeachInf best lecture award 2012 and 2013 for his course "Visual Navigation for Flying Robots".
Appu Shaji, EyeEm
Deep Learning: Revolutionizing the Search for Amazing Photography!
Recording The Visual Mind: Understanding Aesthetics with Deep Learning
With the rise of mobile cameras, the process of capturing good photos has been democratized - and this overload of content has created a challenge in search. One of the important aspects of photography is that every image communicates with a different audience in different form. This talk will address how we use computer vision techniques at EyeEm measure visual aesthetics in photography -and beyond that- personalize the image search experience to find the photos you personally find beautiful.
Appu is the Head, Research & Development at EyeEm. His first company, sight.io, was acquired by EyeEm in 2014 and also held post-doctoral positions at EPFL working alongside Prof. Sabine Süsstrunk and Prof. Pascal Fua. In 2009, Appu obtained his Ph.D. from IIT Bombay, where he was awarded best thesis from Computer Science Dept. He was also selected as one of the most promising 20 entrepreneurs of Switzerland in 2013. His research has appeared in top computer vision journals and conferences such as TPAMI, CVPR, and ACM Multimedia etc.
Eugenio Culurciello, TeraDeep
Eugenio Culurciello (S'97-M'99) received the Ph.D. degree in Electrical and Computer Engineering in 2004 from the Johns Hopkins University, Baltimore, MD. Dr. Culurciello is TeraDeep founder and leader: http://teradeep.com/. He is also an associate professor of the Department of Electrical and Computer Engineering, Mechanical Engineering, the Weldon School of Biomedical Engineering, and of Psychological Sciences in the College of Health & Human Sciences at Purdue University, where he directs the ‘e-Lab’ laboratory. Eugenio Culurciello was the recipient of The Presidential Early Career Award for Scientists and Engineers (PECASE), the Distinguished Lecturer of the IEEE (CASS), and is the author of the book "Silicon-on-Sapphire Circuits and Systems, Sensor and Biosensor interfaces" published by McGraw Hill in 2009, and "Biomedical Circuits and System, Integrated Instrumentation" published by Lulu in 2013. http://teradeep.com/blog/euge-cv.html
Paul Murphy, Clarify
Fireside Chat: The Founder & CEO of Clarify
Deep Learning & Speech: Adaptation, the Next Frontier
The speech community is finally excited about deep learning, but we’re proceeding with caution. Adaptation is critical to understanding real-world speech data. We need to adapt to acoustics and language of course, but also to context. To date, DNNs have shown great promise, but their ability to adapt to the unexpected is still in question. This talk will look at where we are today, as well as the challenges still in front of us.
Paul Murphy is one of Clarify's founders and its CEO. Paul's career in software operations industry has spanned twenty years and three continents. Ten years were dedicated to understanding and building large systems on Wall Street for clients like J.P. Morgan and Salomon Brothers. Paul's work in this area allowed him to explore a broad range of computing solutions, from mainframes to web services, and the gamut of space-time tradeoffs required by dissimilar front and back office systems. Thirteen years ago, Paul moved to London to work at Adeptra, a pioneer in the use of automated outbound calling in the area of credit card fraud detection and prevention. As Adeptra's CTO, he developed all of the software which enabled Adeptra to place intelligent interactive outbound calls on behalf of clients. These systems made extensive use of text-to-speech and voice recognition technology. Since then Paul has dedicated his time to developing technologies that leverage emerging voice processing techniques.
Tim Tuttle, MindMeld
The Voice Revolution has Arrived
Voice Assistant Adoption & Attitudes
Intelligent voice interfaces have long been a favorite of science fiction, and recent AI advances are finally making them a reality. Currently most common in smartphone assistants, voice interfaces have also begun appearing in cars, wearables, smart televisions, and connected home devices such as Amazon’s Echo. The performance of such voice interfaces is finally acceptable, users are starting to embrace them.
MindMeld’s presentation will focus on both what’s driving these technical capabilities that are finally making intelligent voice interfaces a reality, and how the demand for such voice capabilities is changing based on findings of research conducted by the company.
These combined trends of a fast changing market has implications for many different devices, applications and industries. The presentation will share data behind these trends and conclusions of what this means for the industry.
Tim is the CEO and Founder of Mindmeld. Tim started his career at the MIT Artificial Intelligence Lab, where he received his PhD. Tim has also served on the research faculty at MIT as well as Bell Laboratories. His first company built the Internet’s first large-scale CDN for real-time data. His second company, Truveo, built the web’s second-largest video search platform, reaching over 70M monthly visitors, and was acquired by AOL. Tim served as Senior Vice President at AOL responsible for the Truveo business unit. Tim is the author of eighteen technical publications, and was selected as one of the 100 Top Young Innovators by MIT Technology Review Magazine.
Vivienne Ming, Socos
The elusive quest to identify and place skilled professionals has become an obsession in the talent wars of the tech industry (not to mention in schools from K though Postdoc). We will discuss the concept of continuous passive predictive (formative) assessment, applied to both learners and professionals, from kindergärtners to (future) CEOs. Building cognitive models using unstructured data and ubiquitous sensors allows the assessment not only of concept mastery, but meta-learning development as well (e.g., "Grit" and "Social-Emotional Intelligence"). Such models can then be used to predict which content will be an effective learning experience for a given learner. In massive courses, from large college lectures to MOOCs, the models can identify ad hoc cohorts for collaborative learning.
Dr. Vivienne Ming is a theoretical neuroscientist, technologist and entrepreneur. She is the co-founder and Managing Partner of Socos, a cutting-edge EdTech company which applies cognitive modeling to align education with life outcomes. Previously, Dr. Ming was Chief Scientist at Gild, an innovative startup that builds better companies by unleashing human potential in their workforce using machine learning. She is a visiting scholar at UC Berkeley's Redwood Center for Theoretical Neuroscience pursuing her research in cognitive prosthetics. Dr. Ming also explores augmented cognition using technology like Google Glass and has been developing predictive models of diabetes and bipolar disorder. Her work and research has received extensive media attention including the New York Times, NPR, Nature, O Magazine, Forbes, and The Atlantic.
Fireside Chat: The Founder & CEO of Clarify
Alejandro Jaimes, Acesio
Alejandro (Alex) Jaimes is CTO & Chief Scientist at Acesio. Acesio focuses on Big Data for predictive analytics in Healthcare to tackle disease at worldwide scale, impacting individuals and entire populations. We use Artificial Intelligence to collect and analyze vast quantities of data to track and predict disease in ways that have never been done before- leveraging environmental variables, population movements, sensor data, and the web. Prior to joining Acesio, Alex was CTO at AiCure and prior to that he was Director of Research/Video Product at Yahoo where he led research and contributions to Yahoo's video products, managing teams of scientists and engineers in New York City, Sunnyvale, Bangalore, and Barcelona. His work focuses on Machine Learning, mixing qualitative and quantitative methods to gain insights on user behavior for product innovation. He has published widely in the top-tier conferences (KDD, WWW, RecSys, CVPR, ACM Multimedia, etc), has been a visiting professor (KAIST), and is a frequent speaker at international academic and industry events. He is a scientist and innovator with 15+ years of international experience in research leading to product impact (Yahoo, KAIST, Telefonica, IDIAP-EPFL, Fuji Xerox, IBM, Siemens, and AT&T Bell Labs). He has worked in the USA, Japan, Chile, Switzerland, Spain, and South Korea, and holds a Ph.D. from Columbia University.
Wojciech Zaremba, Facebook AI Research
Learning to Manipulate Symbols
Blind Spots in Neural Networks
Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions.
We find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extent. We can cause the network to misclassify an image by applying a certain hardly perceptible perturbation, which is found by maximizing the network’s prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.
Wojciech Zaremba is a PhD student at the New York University, and a scientist at Facebook AI Research. His expertise lies on deep learning, where he has experience in computer vision, and natural language processing problems. He is interested in solving symbolic manipulation tasks, which include reasoning about mathematical formulas, or computer program properties.
Wojciech has worked as a member of Google Brain under supervision of Prof. Geoffrey Hinton, and Ilya Sutskever. He holds a Master's degree summa cum laude from Ecole Polytechnique in Paris. He has also received a silver medal at the International Mathematical Olympiad in 2007.