At present, with the development of multimedia communications, the multimedia conference has become a key technology of development, and audio interaction processing is the most essential and critical for the multimedia conference technology, and has a strict real-time requirement. Therefore, in practice, for multiple terminal devices used at different places, if they are used to perform real-time audio interaction at the multiple places, multiple audios need to be mixed and then output, which is referred to as audio mixing.