Internet content items—such as news stories, blog posts, web pages, journal articles, images, slideshows, videos, “tweets”, etc.—can be collected and published in many ways. For example, personalized web portals generate personalized lists of Internet content items. A personalized web portal is a web portal that learns the preferences of each of its users and provides to each of its users content items that are likely to be of interest to the user, based on what is known about the user's preferences. As a further example, web portals can also provide lists of content items that pertain to particular topics.
Accurately categorizing Internet content items is key to creating personalized or topical lists of content items. Categorization of a content item involves assigning, to the content item, one or more content categories that relate to the information in the content item. Examples of content categories include sports, news, fashion, religion, politics, weather, etc.
The more textual information that is known about a content item, the easier it is to determine the topic(s) to which the content item relates. However, many Internet content items, referred to herein as “sparse-info items”, are difficult to categorize because of the sparseness of the information given in connection with the content item. Examples of sparse-info items include short sentences (e.g. “tweets”, comments, status updates), images, and videos that have little or no accompanying text. Sparse-info items frequently do not include the information that traditional categorization methods require for accurate categorization of the content items. A categorization method that is capable of better categorizing sparse-info items would be beneficial in order to include sparse-info items in applications that require categorized content items, such as personalized or topical content item lists.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.