1. Field of the invention
The invention relates to the field of processing electronic documents (e.g., data files). More specifically, the invention relates to the field of searching and optionally linking objects of electronic documents (e.g., to create hyperlinks in HTML cast Web documents, an electronic manual, etc.).
2. Background Information
In some applications, it may be useful to provide links within and/or between documents. For example, in the field of computer networking, "hyperlink navigation" (HLN) may be utilized to establish links within and/or between hypertext electronic documents, such as "Web pages." The links are typically established by creating "hot spots" that, when activated (e.g., with the click of a computer mouse), link a source and a target within and/or between one or more electronic documents (e.g., one or more Web pages, electronic books/manuals, etc.).
One limitation of HLN is the relatively substantial manual labor that is typically involved in creating links, especially in legacy documents. In general, the following steps may be required to create links: (1) scanning one or more documents (which may be paper, microfiche, etc.) using optical character recognition (OCR) to create an electronic file(s) of the one or more documents; (2) editing/formatting the OCR documents; (3) converting the scanned electronic text/image file(s) into a desired format, such as Hypertext Markup Language (HTML); and (4) searching the formatted file(s) to determine sources and targets to create desired links (e.g., hyperlinks) within and/or between the one or more electronic documents. When using certain formats, such as HTML, the errors in the OCR process should be eliminated. However, eliminating such errors may often pose a relatively substantial source of time/labor investment.
For relatively large documents, manual searching, such as for a source and/or a target of a desired link, may be impractical and/or time-consuming. One search technique that may be performed to provide limited search efficiency in an electronic document is character searching. As an example, word processing applications typically provide a character search feature, which allows a user to search a data file (e.g., a text file) for a specified set of characters, such as a word or phrase. When a character search is performed and a match for the specified set of characters (sometimes referred to as a "target" or "hit" pattern) is detected, a "hit" occurs. As a result of a hit, a user may be provided with an indication of the hit (e.g., the set of characters in the document that match the specified set of characters of the search may be highlighted on a display).
Unfortunately, character search techniques may not be useful in some applications, such as link creation within and/or between electronic documents. One reason is that although two sets of characters may be identical (resulting in a search "hit"), the set of characters may have different meanings, thereby resulting in an undesirable match or "hit." In other instances, two literally different sets of characters may have the same meaning (e.g., "page 8, section 2" and "section II, p. 8" and "2-8"), but would fail to be detected as a hit. As a result, past search techniques may fail to detect desired matches, or they may detect invalid matches.
Thus, utilizing past search techniques, a relatively extensive amount of manual search and/or editing may be still be performed to search and/or establish desired links within and/or between one or more electronic documents.