1. Field of the Invention
The present embodiments relate to methods for personalizing news, and more particularly, methods, systems, and computer programs for categorizing news articles and determining the scope of geographical interest for the news articles.
2. Description of the Related Art
The Internet has witnessed an explosive growth of online news. According to a recent report, more than 123 million people visited news websites such as Yahoo!™ News in May 2010, representing 57 percent of the total U.S. internet audience, with each visitor reading 43 pages on average. These numbers have been steadily increasing over the past years and show the growing appeal of reading news online.
Recommending interesting news articles to users has become extremely important for internet providers looking to maintain users' interest. While existing Web services, such as Yahoo!, attract users' initial clicks, ways to engage users after their initial visit are largely under explored.
Personalized news deliver a news stream to a user, according to the desires and use trends of the user. However, customizing the news stream is a complex problem because the number of news sources continues growing rapidly. In one estimate, there are between 600,000 and 2,000,000 different news categories or topics for filtering news. This wide variety of topics makes it hard to filter news for users.
Tens of millions of news items are created each day. Automatic categorization of news articles is critical to be able to deliver a personalized news stream.
There are some existing classifiers that analyze the content of a news article in order to determine the topic of the article. However, content analysis is sometimes incomplete. For example, if a news article contains a football game score, the news article may be categorized as in the topic of “Sports.” However, if the football game is the Super Bowl, the news article may be categorized as “General News.” For example, an article titled “The Raiders beat the Niners by three points” is likely sports. However, an article titled “The Raiders won the Super Bowl” could be General News (and/or Sports).
Most times, it is virtually impossible to determine the region of interest in the world for a news article just by looking at the content of the article. For example, a kidnapping may be news of interest for a county or a state where the kidnapping took place. But in some cases, the kidnapping may have national or worldwide appeal.
In some solutions today, determining the topic and the geographic scope of news articles is performed by editors that analyze each of the articles from a corpus of news documents. This process is expensive and cumbersome, and may also be limited by the editors' familiarity with the news topics.
It is in this context that embodiments arise.