A necessary part of virtually every computer is a file system, used for storing computer programs and associated data. Recent advances provide for the searching of file systems, enabling users to easily locate any file. In the case of data files, not only can users search based on the name of a file, but they can further search based on attributes of the file (e.g., author, date of creation) and even on the textual contents within the file (e.g., words in a document, words in an email).
To some extent, the ability to search has been enhanced by advances in indexing, which is the process of cataloging the contents of one or more file systems in such a way as to improve the time it takes to perform a search. Such indexes are commonly associated with Internet search engines (e.g., MSN Search, Google) which catalog huge swaths of World Wide Web content. But indexing has also come to the desktop computer, enhancing the searching of local file systems.
While cataloging the contents of a computer, a search indexing program may encounter file containers. Types of file containers may include compressed and/or archived files (e.g., file formats such as zip, cabinet (CAB), tape archive (TAR) and other collections of associated file references. File references may act as placeholder files which merely point to a file somewhere on a local or remote file system. File references may also point to items other than files, such as a specific email within a file containing multiple emails.
When an indexing service indexes a particular file and also indexes a file reference having the particular file as its target, the indexing service may create multiple index entries for essentially the same file. As a result, the same file may appear multiple times in a particular set of search results, inevitably confusing the user. Furthermore, if a referenced file is located on a remote file system, an indexing program may not know to catalog its contents, improperly preventing its inclusion in search results.
Other problems may arise when an indexing program encounters a file container, such as a zip file. An indexing program may not be able to access the contents of the file container, excluding potentially relevant results from a search. But even if a file container is accessible, an indexing program may be unable to properly index files and file references stored within the file container, possibly leading to confusing or incomplete search results.