Image Annotation using Deep Learning and Fisher Vectors
We present a system for solving the holy grail of computer vision -- matching images and text and describing an image by an automatically generated text. Our system is based on combining deep learning tools for images and text, namely Convolutional Neural Networks, word2vec, and Recurrent Neural Networks, together with a classical computer vision tool, the Fisher Vector. The Fisher Vector is modified to support hybrid distributions that are a better fit natural language processing. Our method proves to be extremely potent and we outperform by a significant margin all concurrent methods.
Prof. Lior Wolf is a faculty member at the School of Computer Science at Tel-Aviv University. Previously, he was a post-doctoral associate in Prof. Poggio's lab at MIT. He graduated from the Hebrew University, Jerusalem, where he worked under the supervision of Prof. Shashua. Lior Wolf was awarded the 2008 Sackler Career Development Chair, the Colton Excellence Fellowship for new faculty (2006-2008), the Max Shlumiuk Award for 2004, and the Rothchild Fellowship for 2004. His joint work with Prof. Shashua in ECCV 2000 received the best paper award, and their work in ICCV 2001 received the Marr Prize honorable mention. He was also awarded the best paper award at the post ICCV 2009 workshop on eHeritage, and the pre-CVPR2013 workshop on action recognition. Prof. Wolf research focuses on computer vision and applications of machine learning and includes topics such as face identification, document analysis, digital paleography, and video action recognition.