The retrieval of items over a computer network, such as the “web,” depends heavily on text content of the items. Often the items require additional content (e.g., metadata) or description for efficient retrieval, categorization, and/or information management. “Tagging,” or assigning keywords, is one of the most popular approaches for creating useful content or semantic descriptions for such items. For example, tagging text documents may be used for categorization (e.g., clustering) of documents for information management. Similarly, tagging multimedia items such as images, songs, and videos using online tools has led to the development of specialized search engines that are able to search non-text items using text input.
Images, videos, songs and similar multimedia items are common items which are tagged either manually or in a semi-automatic manner. In general the content structure of such items is easy to interpret and hence the tagging process is simplified. However, the tagging process becomes more complicated when an item has little suitable content. Such a problem affects items like scientific research datasets that have little descriptive text content, documents with very little text content, or other items. Furthermore, several items' content might be protected and not accessible for summarization. For example, in several cases the full content of a research paper or other document may not be publically available to users. Additionally, available content for an item, such as the title of a document, may not convey the essence of the content in the item. While manual tagging of items by experts is desirable, it is not a feasible solution on a large scale.