EM Grais, H Wierstorf, D Ward, MD Plumbley, "Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation," in Latent Variable Analysis and Signal Separation (LVA/ICA), p. 340-350 (2018). [ link ]
Bibtex
@inproceedings{Grais2018a,
title = {Multi-Resolution Fully Convolutional Neural Networks for
Monaural Audio Source Separation},
author = {Grais, Emad M. and Wierstorf, Hagen and Ward, Dominic and
Plumbley, Mark D.},
booktitle = {Latent Variable Analysis and Signal Separation (LVA/ICA)},
address = {Guildford, UK},
month = {June},
year = {2018},
doi = {10.1007/978-3-319-93764-9_32},
url = {https://doi.org/10.1007/978-3-319-93764-9_32}
}
Abstract
In deep neural networks with convolutional layers, all the neurons in each layer typically have the same size receptive fields (RFs) with the same resolution. Convolutional layers with neurons that have large RF capture global information from the input features, while layers with neurons that have small RF size capture local details with high resolution from the input features. In this work, we introduce novel deep multi-resolution fully convolutional neural networks (MR-FCN), where each layer has a range of neurons with different RF sizes to extract multi-resolution features that capture the global and local information from its input features. The proposed MR-FCN is applied to separate the singing voice from mixtures of music sources. Experimental results show that using MR-FCN improves the performance compared to feedforward deep neural networks (DNNs) and single resolution deep fully convolutional neural networks (FCNs) on the audio source separation problem.