Modern data processing systems, such as general purpose computer systems, allow the users of such systems to create a variety of different types of data files. For example, a typical user of a data processing system may create text files with a word processing program such as Microsoft Word or may create an image file with an image processing program such as Adobe's PhotoShop. Numerous other types of files are capable of being created or modified, edited, and otherwise used by one or more users for a typical data processing system. The large number of the different types of files that can be created or modified can present a challenge to a typical user who is seeking to find a particular file which has been created.
Modern data processing systems often include a file management system which allows a user to place files in various directories or subdirectories (e.g. folders) and allows a user to give the file a name. Further, these file management systems often allow a user to find a file by searching not only the content of a file, but also by searching for the file's name, or the date of creation, or the date of modification, or the type of file. An example of such a file management system is the Finder program which operates on Macintosh computers from Apple Computer, Inc. of Cupertino, Calif. Another example of a file management system program is the Windows Explorer program which operates on the Windows operating system from Microsoft Corporation of Redmond, Wash. Both the Finder program and the Windows Explorer program include a find command which allows a user to search for files by various criteria including a file name or a date of creation or a date of modification or the type of file. This search capability searches through information which is the same for each file, regardless of the type of file. Thus, for example, the searchable data for a Microsoft Word file is the same as the searchable data for an Adobe PhotoShop file, and this data typically includes the file name, the type of file, the date of creation, the date of last modification, the size of the file and certain other parameters which may be maintained for the file by the file management system.
Certain presently existing application programs allow a user to maintain data about a particular file. This data about a particular file may be considered metadata because it is data about other data. This metadata for a particular file may include information about the author of a file, a summary of the document, and various other types of information. Some file management systems, such as the Finder program, allow users to find a file by searching through the metadata.
In a typical system, the various content, file, and metadata are indexed for later retrieval using a program such as the Finder program, in what is commonly referred to as an inverted index. For example, an inverted index might contain a list of references to documents in which a particular word appears. Given the large numbers of words and documents in which the words may appear, an inverted index can be extremely large.
The size of an index presents many challenges in processing and storing the index, such as updating the index or using the index to perform a search. The larger an index is, the slower it is to update or search it. However, it is typically faster to search one larger index than two smaller ones. Thus, it is generally beneficial to have as few indexes as possible, such that you rarely have to search more than one index.
The result set when searching a larger index is generally larger than when searching a smaller index. Even though it may be faster to search one large index resulting in a large result set, there may also be a performance benefit to limiting the size of the result set when “scoping” is involved. Scoping refers to searching in a particular place on your hard drive, rather than searching the entire hard drive. It requires a “reverse lookup” from file id to file path, which is a relatively time consuming operation. For example, searching a hard drive for documents containing the term “apple” takes 0.075 seconds; when limited to the user's home directory, it takes much longer, from 0.75 seconds up to 10 seconds!