D Ward, H Wierstorf, R Mason, EM Grais, MD Plumbley, "BSS eval or PEASS? Predicting the perception of singing-voice separation," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), p. 596-600 (2018). [ link ] [ paper ] [ poster ]

Bibtex

@inproceedings{Ward2018a,
    title     = {BSS eval or PEASS? Predicting the perception of singing-voice
                 separation},
    author    = {Ward, Dominic and Wierstorf, Hagen and Mason, Russel
                 and Grais, Emad M. and Plumbley, Mark D.},
    booktitle = {IEEE International Conference on Acoustics, Speech and Signal
                 Processing (ICASSP)},
    address   = {Calgary, Canada},
    pages     = {596--600},
    month     = {April},
    year      = {2018},
    doi       = {10.1109/ICASSP.2018.8462194}
}

Abstract

There is some uncertainty as to whether objective metrics for predicting the perceived quality of audio source separation are sufficiently accurate. This issue was investigated by employing a revised experimental methodology to collect subjective ratings of sound quality and interference of singing-voice recordings that have been extracted from musical mixtures using state-of-the-art audio source separation. A correlation analysis between the experimental data and the measures of two objective evaluation toolkits, BSS Eval and PEASS, was performed to assess their performance. The artifacts-related perceptual score of the PEASS toolkit had the strongest correlation with the perception of artifacts and distortions caused by singing-voice separation. Both the source-to-interference ratio of BSS Eval and the interference-related perceptual score of PEASS showed comparable correlations with the human ratings of interference.