Deep Learning Based Visual Scene and Object Recognition in Machine and Human Visual Systems
In this talk, we present deep learning solutions for three visual scene perception and object recognition problems. The goal is to investigate to which extent deep convolutional neural networks resemble the human visual system for scene perception and object recognition: (1) classification of scenes based on their global properties, (2) deploying multi-resolution technique for object recognition, and (3) evaluating the influence of the high-level context of scene grammar for object and scene recognition. The first problem proposes to drive global properties of a scene as high-level scene descriptions from deep features of convolutional neural networks in scene classification tasks. The second problem shows that fine-tuning the Faster-RCNN to multi-resolution data inspired by human multi-resolution visual system improves the network performance and robustness over a range of spatial frequencies. Finally, the third problem studies the effects of violating the high level scene syntactic and semantic rules on human eye-movement behavior and deep neural scene and object recognition networks.
Akram Bayat is a Research Assistant at the University of Massachusetts Boston where she also received her Ph.D. in computer Science at the Visual Attention Laboratory of the advised by Professor Marc Pomplun. Akram received both the master of Electrical Engineering and the master of Computer Science prior to joining Ph.D. program. She is currently working on how to apply human attentional mechanism to deep neural network for the scene and object recognition. Akram has conducted several projects on Human activity recognition and eye-movement based user classification. She is also interested in computer vision, machine learning, data mining, and human-user interface design.