Learning Where and When to Look
Deep learning models do not only achieve superior performances in image recognition tasks, but also in predicting where and when users focus their attention. This talk will provide an overview of how convolutional neural networks have been trained to predict saliency maps that describe the probability of fixing the gaze on each image location. Different solution have been proposed for this task, and our recent work has added a temporal dimension by predicting the gaze scanpath over 360 degree images for VR/AR. These techniques allow simulating eye tracker data with no need of user data collection.
Xavier Giro-i-Nieto is a learning enthusiast working as an associate professor at the Universitat Politecnica de Catalunya (UPC), in Barcelona, and a certified instructor at the NVIDIA Deep Learning Institute. He has been a visiting scholar at Columbia University and works regularly with Dublin City University, the Barcelona Supercomputing Center and Vilynx. His research interests focus on deep learning for computer vision and natural language processing applied to large scale image retrieval, affective computing, lifelogging from wearables and visual saliency prediction. His current service includes associate editor of the IEEE Transactions in Multimedia and ACM SIGMM Records.