The ability to find relevant material from a large collection of documents (i.e., search or document retrieval) is a well-known and long-studied problem. As to any given topic query, one often wants to know which Web document or author initiated the topic or was the first to talk about the topic. For example, someone started a rumor about a product on the Web and generated many discussions on this topic. The company would like to know who started this rumor. Currently, there is no system that supports this technique or service.
Generally, search engines only return documents or web pages that are most relevant to the query. Some specific search engines provide searching by query and then sort the search results by dates. Take the topic query “vegemite ban” as an example. One search engine returned no result. Another search engine returned only one result, titled “Duck hunting,” which was not relevant to the query topic. The reason for returning this webpage is that the word “ban” is in the article of the webpage, and the word “vegemite” is in an advertisement called “Vegemite Sandwich”.
A third search engine returned many more results than the previous search engines. However, the problem is that the third search engine only supports searching for a query and then simply sorts the results by date. A fourth search engine provides a service that automatically clusters new articles into groups, each of which contains articles on the same topic, and provides sorting based on relevance or date. The clustering results are not always correct, and, in some cases, articles in the same group are not about the same topic. In addition, the fourth search engine only support news articles, and is not for the whole Internet.
Another drawback of existing search engines systems is that they only support webpage level analysis. When a user wants to find which Web document is the initiator, none of the major search engines works to this level of detail.