Carl Vondrick

PhD Student
MIT

SoundNet

We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild. We leverage the natural synchronization between vision and sound to learn an acoustic representation using two-million unlabeled videos. Unlabeled video has the advantage that it can be economically acquired at massive scales, yet contains useful signals about natural sound. We propose a student-teacher training procedure which transfers discriminative visual knowledge from well established visual recognition models into the sound modality using unlabeled video as a bridge. Our sound representation yields significant performance improvements over the state-of-the-art results on standard benchmarks for acoustic scene/object classification. Visualizations suggest some high-level semantics automatically emerge in the sound network, even though it is trained without ground truth labels.

Carl Vondrick is a doctoral candidate at MIT where his research studies computer vision and machine learning. He is particularly interested in leveraging large-scale data with minimal annotation and its applications to predictive vision and scene understanding. His research was awarded the Google Fellowship, the NSF Graduate Fellowship, and has appeared in popular media, such as NPR, CNN, the Associated Press, and the Late Show with Stephen Colbert.

Buttontwitter Buttonlinkedin

As Featured In

Original
Original
Original
Original
Original
Original

Partners & Attendees

Intel.001
Nvidia.001
Graphcoreai.001
Ibm watson health 3.001
Facebook.001
Acc1.001
Rbc research.001
Twentybn.001
Forbes.001
Maluuba 2017.001
Mit tech review.001
Kd nuggets.001