The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for performing electronic content curation.
Electronic storage of content as structured and unstructured electronic documents is proliferating today's computing networks and many applications have been developed to operate on such content to achieve various business purposes. Various types of applications operate on these large volumes of electronic documents or content including search engines, question and answer systems, such as Watson™ available from International Business Machines (IBM) Corporation of Armonk, N.Y., and the like. Watson™ is a supercomputer that processes structured and unstructured electronic content using natural language processing to extract information for answering questions posed to the Watson™ system. More information about the Watson™ system may be obtained for the IBM developerWorks website, such as the document “Watson and Healthcare,” by Michael J. Yuan, Apr. 12, 2011.
Within a group of documents in an unstructured information management system, such as a question and answer system, massive amounts of electronic documents must be evaluated to perform the desired operation and return desired results. Ideally, it would be desirable to be able to load all of these electronic documents into memory for processing by the unstructured information management system. However, reality is that memory is limited and the volume of electronic documents that may be in the corpus of content that can be evaluated by the unstructured information management system is vastly larger than the available memory capacity.