Internet search engines have evolved with the growth of Internet content, but modern search engines typically rely on indexed web content for search operations. This indexed web content is typically provided via web crawling techniques.
Today, content is created at a dizzying pace across the web; every search returns thousands of documents. For many types of queries (e.g., reviews, opinions, prices, polls), a user would like to get a high level view of the aggregate content without having to read each document, and get the latest, most up to date information. For example, with reviews, the user may want to know if most reviews are positive, or negative. For prices, the user may want to know what the distribution of prices is right now. In these cases, the user may be interested in performing simple operations on the content of those documents to help him/her understand the results of a search without actually reading each message. For web documents in general this is a difficult task because the indexed web content may not be as timely as possible. This could be in-part because of the web-crawling nature of indexed content and the search algorithms accounting for historical data. Indexed web content focuses on the full corpus of web content and is not as directed or specific to timeliness of generated content.
By contrast, for small scale documents, microdocuments, this could be a tractable problem. Microdocuments are rapidly generated, new information content, such as short messaging or small content messages found on social network web locations. Another example of a microdocument can be an instant messaging or other type of truncated content communication means. The real time nature of these microdocuments limits the effectiveness of current search engine technologies. Limits exist with being able to electronically process and coordinate microdocument data with the web indexed content. Additionally, as these disparate content elements, microdocuments versus web indexed content, provide varying degrees of information, there are problems with existing systems for harnessing the microdocument information with web indexed content.
Existing search engine technology also fails to account for and utilize this microdocument data in any reasonably manner. Typically display formats are found in social media displays, such as social preference filters, but there are no techniques for application development on top of search engine technologies to utilize this microdocument data.
Thus, there exists a need for a system and method for using search to collect a set of relevant microdocuments and then understand them as a group. There also exists a need for a framework operating in conjunction with search engine technology to utilize the microdocument data.