In present day commercial situations, many digital development software and computer companies work to deliver documentation to their customers in a number of different formats. These formats may show up in a number of different varieties, that is to say the document format may be on paper, for example, or Adobe Acrobat Portable Document Format (PDF) files, or Windows Help files, or Hypertext Markup Language (HTML) and also HTML help files.
The documentation provided to receivers, such as customers, is distributed and made available on, for example, paper documents, on CD ROMs, and on Web Servers.
Of course, it is desirable for a recipient or user to make a full text search of the received documents. However, users cannot perform full-text searches on paper documents, except through long, laborious reading and surveys of the documents. There is, however, software designated as “search engines” that exist in digital technology in order to search files that are distributed to users who download from the Web.
However, these search engines are limited in a number of ways in providing search capability when the document or received Web files involve multiple file types. Most of the existing search engines are designed only to search files of one particular format.
In this type of situation, then it would be necessary to convert all files in the Web documents or Web-received files into a common format. This common format would be the format which was compatible with the particular search engine available.
However, when files are converted into a format different from that in which they were originally created, much of the functionality for searching the original file is lost, and this includes navigating through the file and finding certain special graphics or other content in the file.
There are other types of search engines which are capable in a certain limited way of including search operations for multiple file types in the Web received file documentation. However, these search engines are unable to open all the file types at locations where the search terms appear and then be capable of moving from one such location to the next location within the document.
Thus, these other types of search engines require that the user first search with one particularly favorite engine and then refine the search using another search engine designed for the file type.
One example of a standard (not a full-text) search is what one can do in a product program such as Word. The operator tells Word to find a text string. Then Word starts reading the text in the document by reading each word one at a time beginning at a specified location and comparing the text against the string that was entered. Now, when Word finds a “hit” (match), then Word highlights the text and stops searching. If the operator chooses “Find Next” option, then the Word program repeats the process and continues the search beginning just past the current hit. However, this is considered pretty much of a brute force and very slow process of operation.
A “full text” search, however, works to search a collection of files at one time. It accomplishes this by using an auxiliary collection of files that was created ahead of time and then distributed with the files that are to be searched. If, for example, the operator wished to search 450 files for the word “server,” the software would then read the auxiliary files which will already know all occurrences and locations of the word “server.” Here the software would present the operator with a “hit list” of all files that contained the word that is built from the information in the auxiliary files. If the operator elects to open up any of these files, the software will then open the file, move to the first location in the file (which it already knows from the auxiliary file), and then highlight the word. It may be noted that none of the files are directly searched or scanned. By using such a file, the operator or user can utilize advanced features such as wild cards (“install*”) and Boolean operators (“installation and not printers”).
There are a number of ways to create these auxiliary files. Such a process may take several hours for most of releases to be made on CD-ROM. The success of a “search engine” can be measured by how efficiently the desired files are generated and accessed.
The present invention provides for the use of an existing search engine that is designed to support the searching of one particular file format (PDF, or Adobe® Acrobat® files). This can then be extended to allow the searching of virtually any other type of file format such as HTML, HTML Help, or Windows Help. The method and system accomplishes this by creating a PDF file “duplicate” consisting of the text from the file that the operator wants to search in order to allow the search engine to find the text in the duplicate that was created. Here then there is provided a link from each page in the PDF duplicate into the corresponding location in the file of the other format so that the user-operator has now essentially performed a full-text search in that file.