Advances in Deep Architectures and Methods for Separating Vocals in Recorded Music
Source separation of audio mixtures, with an emphasis on the human voice, remains one of the enticing unsolved challenges in audio signal processing. This challenge is amplified in the context of recorded music, where often many sound sources are intentionally correlated in both time and frequency. In this talk, we present recent advances in the state of the art for separating singing voice and accompaniment in popular music audio recordings, leveraging semi-supervised datasets mined from a large commercial music catalog. In addition, we explore the effects of combining deep convolutional U-Net architectures with multi-task learning for vocal separation.
Eric J. Humphrey is a research scientist at Spotify, and acting Secretary on the board of the International Society for Music Information Retrieval (ISMIR). Previously, he has worked or consulted in a research capacity for various companies, notably THX and MuseAmi, and is a contributing organizer of a monthly Music Hackathon series in NYC. He earned his Ph.D. at New York University in Steinhardt's Music Technology Department under the direction of Juan Pablo Bello, Yann LeCun, and Panayotis Mavromatis, exploring the application of deep learning to the domains of audio signal processing and music informatics. When not trying to help machines understand music, you can find him running the streets of Brooklyn or hiding out in his music studio.