Hierarchically organized data, such as in advanced computer file systems including Microsoft's Object File System (OFS) and Microsoft's Windows NT.TM. File System (NTFS), may be associated with a search engine (query engine). The search engine allows a user to query the data and/or the file system in order to locate documents (i.e., files or objects) that match the user's query specification. For example, the above file systems separately index the contents and the properties of documents stored thereby, so that even though the data in the documents is not structured like database data, the query engine can quickly respond to such queries. To respond to a query, the query engine accesses the index and returns information about the located documents in a result set. Other search engines work similarly with other hierarchically organized data.
The query specification includes a restriction, which is a set of criteria (content and/or properties) that matching documents will possess. A typical query specification also includes a scope, which is the set of folders or directories that are to be examined, and a return set, which identifies which properties are to be returned for each matching document that is returned in the result set. For example, a query may consist of a restriction specifying that matching documents will contain the text "computer software," the scope to examine will be c:.backslash.folder1, and the return set will supply the file name and file size of matching files. The scope can be specified as shallow, whereby only documents in the specified folder are returned, or deep, whereby matching documents in the specified folder and any sub-folders-thereof are returned.
The indexes are inverted text indexes, that is, organized and keyed by textual words, and not by any hierarchical relationship between folders and documents. Consequently, when a query is being processed, the search engine searches the index to obtain the documents that match the specified restriction without respect to scope. To scope test, the search engine performs a string comparison, known as prefix matching, on each of the documents as they are retrieved to determine which, if any, of those files are within the specified scope. Located files that have prefixes corresponding to those in the query specification are said to be "in scope." Properties of those matching files which are in scope are then returned in the result set.
However, string comparisons, and thus prefix matching, are relatively slow and costly processes. Prefix matching is further complicated by the use of both long and short filenames, uppercase and lowercase distinctions in filenames, and by the use of international Unicode file names where one string may have several unique but equivalent representations. In addition, for each located document, the full path of the document's folder is created in memory, in which the space is heap allocated because the path is of an arbitrary and unknown string size with no definite upper limit. Lastly, since more than one folder may be named in a specified scope set, and since the located documents are disjoint, prefix matching will have to be done, one document at a time, for each named folder until a match is found or the document is determined to be not in scope. This means that all specified folders in a given set are prefix matched for documents that are ultimately determined to be not in scope, and, on average, half of the specified folders will be tested for documents that are in scope before a match can be found.
In short, although the above-described query-resolving technique functions adequately when only a small number of files are involved, the prefix matching process consumes substantial resources when a relatively large number of documents are scope tested. At the same time, OFS, NTFS and other systems of hierarchically-organized data are designed to support large result sets containing hundreds of thousands of results. Since OFS and NTFS and the like are typically used in networked client-server environments, it is commonplace to have such large queries, making scope testing costly.