A movie soundtrack or a series soundtrack can contain a music track mixed with, the actors voices or dubbed speech and other audio effects. However, movie or series studios may have obtained the music distribution rights only for a given territory, a given medium (DVD, Blu-Ray, VOD) or for a given duration. It is thus impossible to distribute the audiovisual content including a soundtrack that includes music for which the studio or other distributor of audiovisual content does not have rights to within a territory, beyond a previously expired duration, or for a particular medium, unless high fares are paid to the owners of the music rights.
Thus, there is a need for a process enabling the extraction of a specific acoustical component, such as a musical component, from the acoustical signal mixture, such as the original soundtrack, in order to keep only a residual contribution, such as the voice of the actors and/or the sound effects and other acoustical components for which the distributor of the audiovisual content has the rights to.
Such a process will afford the possibility of reworking the residual contribution to, for example, incorporate other music.
In order to perform such an extraction, one approach consists of considering as known the musical recording corresponding to the contribution to be removed from the mixture. More specifically, we consider a reference acoustical signal that corresponds to a specific recording of the music contribution in the mixture.
Thus, the document Goto, US Pat. Pub. No. 20070021959 (hereinafter “Goto”) discloses a process of music removal capable of subtracting from the acoustical signal mixture, the reference signal, through application of transformations, to obtain a residual signal corresponding to the residual contribution in the initial mixture.
To take into account the differences in volume, temporal position, equalization, etc. between the reference signal and the musical contribution in the mixture, Goto discloses the possibility of correcting the reference signal automatically before subtracting it from the mixture. Goto proposes to perform the correction in a manual way, with the help of a graphical user interface. While the residual acoustical component is not satisfactory, the operator performs an iteration consisting of correcting the reference signal and then subtracting it from the mixture. Given the large number of parameters on which it is possible to modify the reference signal, this known process is not efficient.
The publication by Jaureguiberry et al. “Adaptation of a source-specific dictionaries in Non-Negative Matrix Factorization for source separation”, Int. Conf. on Acoustics, Speech and Signal Processing 2011, discloses a process of acoustical contribution removal, where the modeling of the contribution to remove involves the learning of time-independent spectral shapes (or power spectral densities) on a reference signal, and an adaptation of these spectral shapes with a vector of frequential factors to model the discrepancies between the reference source and the contribution. Results of this method are not satisfactory because of the loss of the temporal structure of the reference acoustical component, and also because the adaptation may not compensate for the differences in the recordings of the reference and of the contribution, that may have very different characteristics (e.g. not the same sound sources, not the same acoustical conditions, not the same note played, etc.).