The present invention relates in general to retrieving selectable data, from a visual display program source document, in response to one or more queries, for insertion into a different format destination document.
Most manufacturers or suppliers of goods and products provide specification sheets providing detailed information about the product. In many instances, the specification sheets are generated in a visual display file format. One popular format is a PDF (portable document format) file. Such a format is often used for display on computer terminals and easy transmission of data from point-to-point via an email attachment or a similar computer network transmission mode. In the case of electronic components, the amount of material contained in a single specification sheet(s) can amount to more than 100 pages of text, charts, tables, diagrams, and graphs.
When the amount of information is voluminous, finding the specific information desired can be very time consuming. As an example, a given set of specification sheets may cover a plurality of chips A, B and C that perform similar, but not identical, functions and have the same pin configurations. Chip A may further be designed to military environmental standards, while chips B and C have lesser requirements. Further, some pins on chips A and B may provide different signals and have different pin labels than similarly positioned pins on chip C. While a large majority of the information for these three chips may be identical, a query pertaining to operating temperature would be likely to retrieve a minimum of three sets of information for maximum operating temperature for the three different chips. It is further likely that there would be data in the sheets on minimum operating temperature. It is also possible that different types of heat sinks and/or air movement conditions would be mentioned in conjunction with operating temperature data. Thus, finding the appropriate data to be retrieved for a given chip may involve considerable amount of time in the perusal of the set of specification sheets.
In the past, the extraction of text from specification sheets has typically been accomplished by manual retyping from an original or copy of that specification sheet(s). Another method has been to display the material on a computer screen and select, copy and paste material from a source document to a destination document. While the last mentioned approach has, in some cases, been more accurate than retyping, the pasted material in the destination document requires considerable modification and reorganization and is often slower than retyping in the first place. Further, the correct material must be found in the voluminous material of the source document. Thus, the manual search, select, copy, and paste method is still so labor intensive it is seldom used.
To visually display information on a computer screen, some programs insert text, data and graphics characters and symbols into non-visible receptacles or containers. These containers or receptacles may then be axially oriented and positioned on the screen to display the information to be presented. An example of one such visual display program file is typically referred to as a PDF (portable document format) file. The process of retrieving data, and in particular decomposed data, from a PDF file is shown in more detail in a co-pending patent application, serial number 09/594052, filed on the same day as this application, assigned to the same assignee as the present invention, and entitled DATA MERGE AND EXTRACTION METHOD AND APPARATUS.
To accomplish the retrieval of data from a PDF format specification sheet, it would thus be desirable to have a program or process where a user may select a visual display source file, display the contents visually in a first window, be able to query the source file for all occurrences of a given type of data and to have displayed in a second window a listing of the data found that corresponds to the query. A determination as to which one of the sets of retrieved data in the list is appropriate would be facilitated if the listing further included text in close proximity to the specific word or phrase data matching the query. The ability to quickly find and view the portion of the source document that corresponds to a given set of data in the list would also minimize the time necessary to make the determination of the correct set of data. Once a determination is made that the correct data has been found, it would be desirable that the material, selected as appropriate, be transferable from either the source document or the retrieved list for placement in a destination document of a different format from that of the source without further typing. It would further be desirable to be able to edit the transferred material in the destination document, either in a third window or additional windows.
It would also be desirable to be able to formulate a list of standardized queries that, when acted upon by an extraction program, would generate a destination document in accordance with a predetermined format with data, retrieved from a selected source document, already inserted on a xe2x80x9cbest programmed guessxe2x80x9d basis. In such a situation, it would then be desirable to be able to, from the destination document, obtain a displayed view of the portion of the source document from which data was retrieved for any specific category of data inserted. It would also be desirable to review a listing of the query results (pre-mined data) from which the data was selected from the destination document.
Since the specification sheets cannot always be obtained in a visually displayable format, whether PDF or some other format, they may have to be reproduced in a computer using OCR (optical character reader) scanning techniques. The resulting displayed source document, depending upon the specific OCR technology used, may be straight text or may use containerized text techniques similar to the PDF files mentioned above. Thus, it would be desirable to be able to accomplish the retrieval of text from source documents of various formats in the manner described above.
The present invention comprises an apparatus for and a method of retrieving data from a displayed source document and displaying a listing of segments of text accompanying data located in and batch queried from the source document to facilitate the action of inserting pertinent data in a differently formatted destination document.