The amount of content being created on the Internet is constantly increasing. Sources such as news articles, blog postings, Facebook postings, and Twitter postings contribute to the increasing amount of content. For example, Twitter alone currently produces more than 300 million pieces of content per day.
Sometimes, users want to be notified when a particular word or phrase is mentioned or discussed on the Internet. For example, a user may want to receive a notification when new content is generated that mentions a specific sports team that the user is following, or that mentions a specific company that the user is interested in (e.g., discusses a specific new product that the company is selling). Due to the huge amount of content that is being created on the Internet on a daily basis, it is a challenge to watch the content and perform such notifications.
Traditional search systems can provide for notifications when new content matches specific queries. For example, a traditional search system could index a day's worth of content (e.g., 300 million pieces of content) at the end of the day into a search engine. The traditional search system could then query the indexed content using users' queries to obtain matches. This solution using a traditional search system suffers from a number of limitations. For example, there may be a long lag time (e.g., a day) between creation of the content and when it is indexed and available for searching. In addition, maintaining an up-to-date index can be difficult due to the frequency of new content begin created (e.g., continuously indexing, re-indexing, or updating the index may not be practical). Furthermore, indexing a huge amount of content on a regular basis can be very time and computing resource intensive.
Because indexing documents can take a substantial amount of time (especially if the number of documents to index is large), it can be difficult to maintain an index that represents frequently changing activity. Furthermore, if an index is not maintained on a real-time (or near-real-time) basis, then results from the index will not reflect the current state of the content.
Furthermore, there are often many users, each of which may have a number of queries, and it may not be practical to check all the queries for all the users against an updated index of documents for new matches. For example, there could be thousands or millions (or more) queries, and to obtain up-to-date results, and each query would need to be sent to a search engine for processing each time the index is updated with new documents.
Therefore, there exists ample opportunity for improvement in technologies related to matching documents against queries.