Content based multimedia search has gained a lot of attention with the rapid increase in multimedia quantity and quality. As the ability to broadcast video content (including games) has gone beyond television to the Internet and mobile phones, video advertising is becoming an attractive and plausible source of revenue. While today video advertising accounts for only a minuscule proportion of media budgets, it presents a significant opportunity for advertisers to extend the reach of their campaigns with compelling content. This demands to select relevant advertisements for targeting viewers based on the video content. There is a definitive need to determine the deeper semantics of the video and to select relevant advertisements based on the semantics. Presently, most of the video search techniques analyze textual information such as file name, web page address, and surrounding texts, and uses the same for providing metadata description for a video. Such a metadata is too abstract and does not describe the whole video in a semantically consistent manner. Moreover, annotation using such a metadata of a video is inappropriate and incomplete without taking the semantics from the audio, important objects in the video including prominent faces, scene texts, text captions, and other related information about the video. In order to achieve this kind of exhaustive and comprehensive semantics based annotation, it is required to deploy multiple multimedia analysis techniques and combine them in the most appropriate manner to arrive at a maximally consistent annotation of the video. The present invention addresses the issue of combining the results of the multiple multimedia analysis techniques in such a manner that the overall combined description of a multimedia is consistent and comprehensive.