Nowadays, during video monitoring, analog video capturing points of a single channel normally only correspond to one channel of audio. And audio and video signals are synthesized into a combined data stream by embedded devices via a series of operations, such as capturing, encoding and packaging, for audio and video applications, such as local storage and remote request.
However, with the improvement of video monitoring requirements, currently there exists a monitoring scene in which a monitoring area provided with an IP camera is divided into a plurality of different functional zones (e.g., several counters), in such a monitoring scene, the managing center for video monitoring is not only required to remotely capture and play a real-time video, but is also required to randomly play a channel of audio. For this monitoring scene, the existing monitoring methods in which analog video capturing points of a single channel correspond to one channel of audio are obviously incapable of satisfying the application requirements of single channel of video coordinating with multiple channels of audio.