In recent years, the availability and scalability of multimedia resources on the internet has grown tremendously. Thus, the development and advancement of audio and video information retrieval technologies are becoming more and more significant and popular. However, existing methods often are concentrating on either visual content or audio content individually, and often lack proper ways to combine visual and audio information.
Further, according to the present disclosure, in practice, a system to deal with intensive multi-task may require that the information used to retrieve contents be minimized as much as possible. Audio retrieval may require smaller bitrates but its response time may be longer due to the nature of sound compared to more informative two dimensional video frames. It may be desired to optimize the bit flow such that the bitrate may be minimized whilst optimized retrieval performance may be kept.
The disclosed methods and systems are directed to solve one or more problems set forth above and other problems.