NLP for Music and Audio: Challenges and Opportunities
A vast amount of music information available on social media, web pages, online forums, and digital libraries, etc., is represented in natural language. Making sense of this information is challenging due to the unstructured nature of the data. Meanwhile, the AI landscape is becoming increasingly multimodal. In this talk, I will bring together the recent developments at the intersection of NLP, Music Information Retrieval (MIR) and audio AI. We will walk through the challenges of applying NLP in MIR that enables machines to make sense of the world through multimodal music and sound data, including building a music voice assistant. I will also report on the recent developments in the audio AI community and discuss how NLP plays a role in it.
Shuo Zhang is a Senior ML Research Engineer at Bose Corp., where he works on machine learning and deep learning for audio signal processing and NLP applications. Prior to Bose, he worked at the Music Technology Group, Universitat Pompeu Fabra in Barcelona. Shuo received his PhD from Georgetown University with a focus on computational linguistics. In 2016, he co-taught a tutorial on the application of NLP in Music Information Retrieval at the ISMIR conference. This year, Shuo serves as the Co-Chair of Industry Liaisons of the DCASE 2021 conference - the leading conference of audio AI.