With the rapid development of Web 2.0 technology, a huge amount of images are produced, so that quick browsing and retrieving images to be needed become time consuming and laborious. In order to browsing these images quickly and effectively, image tagging is more and more important and indispensable.
Conventional image tagging methods often consider a single modality. However, a single modality cannot provide sufficient information for featuring an image, and more and more studies show that it is beneficial to consider multiple modalities at the same time. Therefore, an image tagging technology in which multiple modalities of an image are fused becomes more and more important.
A search-based image tagging method is a lately proposed image tagging method for fusing multiple modalities. It first normalizes each of the modalities, then directly concatenates all the normalized modalities to obtain a single modality, and finally find neighboring images by using the concatenated single modality and count tags of all the neighboring images to obtain a final tagging result.
However, the inventors found that the method is simply to directly concatenate all the normalized modalities. Due to measures of the modalities, it is difficult to unify the measures of all the modalities through normalization, thereby being unable to effectively fuse multiple modalities.
It should be noted that the above description of the background is merely provided for clear and complete explanation of the present invention and for easy understanding by those skilled in the art. And it should not be understood that the above technical solution is known to those skilled in the art as it is described in the background of the present invention.
Following documents are listed for the easy understanding of the present invention and conventional technologies, which are incorporated herein by reference as they are fully stated in this text.    1. P. Gehler and S. Nowozin. On feature combination for multiclass object classification, In Proceedings of International Conference on Computer Vision, 2009; and    2. X. Li, C. Snoek, and M. Worring. Learning social tag relevance by neighbor voting, IEEE Transactions on Multimedia, 1310-1322, 2009.