This section is intended to introduce the reader to various aspects of art, which may be related to the present embodiments that are described below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light.
Audio processing remains an important part of media content generation and conversion in both home and professional settings. Several types of audio processing that are often used in particular with professional media content generation and conversion include, but are not limited to, audio restoration, audio remastering, audio upmixing (e.g., stereo audio to 5.1 audio conversion), audio downmixing (e.g., 5.1 audio to stereo audio conversion), audio source separation (e.g., extracting individual sound sources such as lead vocals), and reconstruction of a missing audio channel (e.g., sound scene capture by a particular microphone). All of these processing mechanisms are important to a wide range of professional studio applications as well as home audio applications. Furthermore, having fully automatic and efficient methods for the processing mechanism is highly desirable.
Some automatic processing solutions exist for the various types of audio processing used in media content generation and conversion. For example, audio restoration may consist of audio denoising and/or bandwidth extension. In some systems, denoising may also be accompanied by some frequency equalization. Further, solutions exist for separating audio sources automatically. For audio upmixing, some fully automatic solutions have been proposed by Dolby (e.g., Pro Logic II) and Digital Theater Sound (DTS) (e.g., Neural Surround™ UpMix). However, these solutions are only satisfactory to a certain extent. Automatic source separation, while possible, often leads to results that are far from being satisfactory, and user-guided methods may lead to much better results. As for audio restoration, remastering, upmixing and downmixing, even the final result of such such audio processing is not always uniquely specified and may be a product of many subjective decisions. For example, during audio upmixing one sound engineer may decide to put drums in the center while mixing a song and another sound engineer may decide to put them slightly to the left. As for above-mentioned existing automatic stereo audio to 5.1 audio upmixing solutions by Dolby and DTS, these solutions often consist of a simple spreading of the stereo content over the six audio channels in 5.1 audio without analyzing each particular sound, such as, e.g., lead vocals, drums, etc.
The existing solutions for the above-described problems are still far from a good compromise between a solution that is fully automatic (i.e., does not need any human intervention), and a solution that may only be semi-automatic or more user interactive while producing high quality results. Therefore, there is a need for an improved mechanism for automatic processing of audio content during media content generation or conversion, such as audio restoration, audio remastering, audio upmixing, audio downmixing, audio source separation, or reconstruction of a missing audio channel.