Users have grown accustomed to searching for material relevant to a query over a network, such as the Internet. Conventionally, search systems receive a query having query words associated with web pages from a user and process the query to provide results to the user. The results include a listing of web pages that match the query. For each web page in the listing, a summary of the web page is provided. The summary allows the user to better understand content included in the web page. In the conventional search system, the summary of each web page is generated by processing each web page matching the query to detect a structure of the web page. The structure is based on tags included in the web page. Based on the detected structure of each web page, specific portions—such as paragraph and header sections—of the web page are searched to find all locations of the query words. In turn, sentences that are co-located with the query words included in the query are utilized to generate the summary.
It would be beneficial if users could search on-line book content to find books relevant to a particular search query, in a way similar to other on-line searches. But applying the conventional summary generation process utilized by conventional search systems to books introduces numerous performance issues. The conventional summary generation process is designed to process web pages. Typically, web pages are short structured documents that contain less than five pages. On the other hand, a book may be unstructured and may contain several hundred pages. Because a book is orders of magnitude larger than a web page, the conventional summary generation process is unable to provide a summary of an unstructured book that includes the query words in an acceptable period of time. The traversal of each page of the unstructured book to locate the query words creates a processing bottleneck that drastically reduces the time efficiency of the conventional summary generation process. In other words, when applied to unstructured books, the conventional summary generation process is too slow.