In recent years, many enterprises, whether business, governmental, or any other organized undertaking, require large amounts of information to be analyzed and available for use in the daily execution of their activities. Often the informational needs of the enterprise can take the form of documents being used daily and other information that may not have been accessed in weeks, months, or years, and may only exist in archive.
The growth of “paperless” offices has dramatically increased the scale at which digital information is being stored as the only version of certain data. With a vast sea of accessible data files available on a company's server, conventional Information Retrieval (IR) technologies have become more and more insufficient to find relevant information effectively. It is quite common that a keyword-based search on the company file storage system may return hundreds (or even thousands) of hits, by which the user is often overwhelmed. There is an increasing need for new technologies that may assist users in sifting through vast volumes of information, and which may quickly identify the most relevant data files.
Traditional search engines accept a search query from a user, search every data file, and generate a list of search results. The user typically views one or two of the results and then discards the results. However, some queries may also return summaries which greatly facilitate the task of finding the desired information in the data file. Typically, these “summaries” are just the 10 or 20 words surrounding the sought-for keyword and have no bearing on the context of the data file as a whole. Also, a query-based summarization system may have steep requirements in terms of transmission bandwidth, data storage, processor utilization, and time to return a result.
Hence a need exists for a way to expedite searching for data files and provide a summary more reflective of the data file it is taken from.