In a real world situation, audio signals such as speech are perceived against a background of other audio signals with different characteristics. While humans are able to listen and isolate individual speech in a complex acoustic mixture (known as the “cocktail party problem”, where a number of people are talking simultaneously in a room (like at a cocktail party)) in order to follow one of several simultaneous discussions, audio source separation remains a challenging topic for machine implementation. Audio source separation, which aims to estimate individual sources in a target comprising a plurality of sources, is one of the emerging research topics due to its potential applications to audio signal processing, e.g., automatic music transcription and speech recognition. A practical usage scenario is the separation of speech from a mixture of background music and effects, such as in a film or TV soundtrack. According to prior art, such separation is guided by a ‘guide sound’, that is for example produced by a user humming a target sound marked for separation. Yet another prior art method proposes the use of a musical score to guide source separation of a music in audio mixture. According to the latter method, the musical score is synthesized, and then the synthesized musical score, i.e. the resulting audio signal is used as a guide source that relates to a source in the mixture. However, it would be desirable to be able to take into account other sources of information for generating the guide audio source, such as textual information about a speech source that appears in the mixture.
The present disclosure tries to alleviate some of the inconveniences of prior-art solutions.