1. Field
This disclosure relates to extracting knowledge from web-based documents at distributed processing nodes and preparing electronic documents from the extracted knowledge.
2. Information
For an Internet search engine, for example, to operate properly, web-based documents may generally require some level of organization and categorization prior to a search engine processing one or more search queries. This ensures that relevant content is available in a timely manner in response to a submitted search query. If such organization and cataloging of potentially millions or even billions of web-based documents were to occur only after receipt of a search query, search engine users would experience an unacceptable delay between submitting a search query and receiving results of the query. Further, such real-time searching of a huge number of web-based documents in response to individually submitted queries would represent an extraordinary burden on search engine resources.
In addition, with many thousands of new or revised web-based documents added to the Internet each day, search engine providers may be continuously cataloging contents and locations of new documents in a background process so that if a search query is received from a user, the search engine may immediately provide accurate, timely, and comprehensive results. To accommodate such a large and ever-expanding corpus of documents, standard workflow tools for analyzing and cataloging web-based documents may be employed, for example, by search engine providers. However, many cataloging and analyzing operate at an unacceptably slow pace. In one illustrative example, to analyze contents of web-based documents representing 60.0 terabytes, which may, for example, represent 60.0 million documents with each comprising an average size of 1.0 megabyte, may require a time period ranging from several hours to up to two days. Accordingly, evaluating and cataloging a constantly increasing corpus of web-based documents may consume enormous processing resources as well as requiring considerable expenditure of time.
Reference is made in the following detailed description to accompanying drawings, which form a part hereof, wherein like numerals may designate like parts throughout to indicate corresponding and/or analogous components. It will be appreciated that components illustrated in the figures have not necessarily been drawn to scale, such as for simplicity and/or clarity of illustration. For example, dimensions of some components may be exaggerated relative to other components. Further, it is to be understood that other embodiments may be utilized. Furthermore, structural and/or other changes may be made without departing from claimed subject matter. It should also be noted that directions and/or references, for example, up, down, top, bottom, and so on, may be used to facilitate discussion of drawings and/or are not intended to restrict application of claimed subject matter. Therefore, the following detailed description is not to be taken to limit claimed subject matter and/or equivalents.