Due to proprietary nature of formats in which documents containing text (in any natural language) are formatted by a common word-processing program (such as WORD sold by MICROSOFT CORPORATION or WORDPERFECT sold by COREL CORPORATION), if any analysis is to be done, it becomes necessary for such documents to be reviewed manually. For example, a human typically reviews a group of documents by individually reviewing each document one at a time, specifically by opening each document in a word-processing program, visually reading text displayed therein (using the human's eyes), until a specific text of interest is found. The human then performs some analysis of the specific text of interest, e.g. manually counts the number of rows in a specific table in the document. However, manual review of a large number of documents becomes laborious, time consuming and inefficient for a human.
Several solutions to automatically extract contents of word processing documents have been developed in the past. However, such prior art solutions appear to work on documents containing specific format and content e.g. at specific positions. In other words, the prior art solutions that are known to the inventors cannot be utilized as a generic solution across documents containing different format and content by the users without rewriting the code. Most of such solutions also simply extract the content from word processing documents, and lack capabilities to analyze document content to collect actionable intelligence.
The current inventors believe that a solution to automatically analyze documents of different types and containing different content would dramatically improve efficiencies and accuracy of people and companies using word processing documents. The current inventors further believe that there is a need to automatically perform analysis of multiple documents for one or more user-specified subset(s) of content within each document, e.g. to limit a search to count a number of rows in a specific table or to search only a specific subsection in a specific section, by use of an invention of the type described below.