1. Field of the Invention
The present invention relates to improved searching on the Internet or similar networks and especially Meta News and/or improved automatically generated newspapers, and more specifically to a system and method for improved automatic collection and displaying of news items on the Internet.
2. Background
The Internet makes it possible for users to access vast amounts of information, thus becoming effectively the world's largest library and the world's largest database. This opens up fascinating new possibilities, such as for example automatically accessing a huge amount of news sources in order to present to the user for example an automatically edited “news paper”, which automatically selects the most important events or news items according to various criteria. However, one of the biggest problems is integrating efficiently vast amounts of information and analyzing it.
Google has recently made available at news.google.com an automated “newspaper”, which searches continuously about 4,500 news sources, and lets users view automatically generated headlines in one of a few general areas (which are currently: Top Stories, World, US, Business, Sci/Tech, Sports, Entertainment and Health), or one newspaper divided to the above sections, or lets users search for news by keywords. In addition, users can choose between a number of possible countries (which are currently: Australia, Canada, France, Deutschland, India, Italia, New Zealand, U.K., US), and thus news items can change according to the chosen country. The automatic determination of which news items or news stories are most important is done by 3 main criteria: In how many sources the news item appeared, how important are the news sources in which it appeared, and how close is it to the top in each of these news sources.
However, many problems still remain, such as for example:    1. The current system chooses for each headline just one of the possible sources (Including the first sentence in that news item) and also a photo from one of the possible sources (typically from another source), and typically indicates below in smaller print a few additional related headline links below, and then a few additional names of news sources below, which also link to related items, and then there is a final link to typically a few hundreds of additional related links. This leads to the following problems:            a. The choice of a single main news source and a single image for each item seems arbitrary to the user and leads him to prefer this source for reading the full news item, since he has much less information about the other links.        b. Similarly, the choice of the additional smaller links below also seems arbitrary to the user.        c. Due to space limitations the clustering possibilities in the first page are limited, so if for example there is room for only 2-4 main news items in each category, then very board loosely related items might be presented as a single news item.        d. If the user clicks on the final “related items” link, he typically gets hundreds or even more than a thousand links to related news items (with the headline, source, time, and the first 2 lines), sorted either by relevance or by time, however, the new list is now without any images and without any clustering, so that many times news stories that are about the same event or even identical (for example due to two or more news sources using exactly the same item from a news agency), may appear at different positions in the list of related links, and various other news items which are more different might appear between them and can be also dispersed in various places. This makes it vary hard for the user to take advantage efficiently of the list of related items. (Although clicking on the next 30 links each time may eventually show for example only for example 25-30% actual links due to removing some very similar entries, like Google does also with normal web pages results, this still leaves the shown items un-clustered, as explained above).            2. Allowing the user to choose between a few top categories is very limited by nature and does not even come close to the true potential of such systems. On the other hand, when searching by keywords, the user immediately reaches a list of results that is similar to the list that he reaches when clicking on the final list of “related items”, as explained below, and thus is subject to the same limitations. Although many times this first list shows for some of the items, especially in the beginning, a few additional sub-items and a link that says “and more”, clicking on the “and more” links always apparently generates only a completely linear and non-clustered list again, like in the case of clicking on the “related items” links in the automatic newspaper front page, as explained above. For example, searching for the word “Israel” in Google news shows that there are 12,600 items, and the 2nd result has the headline Israel Wants to Exile Arafat—But Not Yet, with a few additional smaller links and the “and more” link. But clicking on the “and more” list brings up a linear list that says that there are 1,010 items, and now there no clustering at all (except for deleting entries as explained above). Also, sorting by date always seems to create only a linear list with no clustering at all, even when it is the first list generated by searching for the keywords. In addition, if the user chooses one of the few top level subject categories, he/she gets each time only 20 basic clusters and that's it, which can be quite frustrating, since there can be many other issues within that category that might be interesting for the user but he/she misses them because they are not within the top 20.
Thus, it would be highly desirable to have an improved News MetaSearch or improved automatically generated “Newspaper” which solves the above problems and preferably adds also many additional useful features. Other problems with other types of searches are also explained and solved below.