1. Field of the Invention
In general, the present invention relates to file management. Specifically, the present invention provides a system for indexing files and for tracking symbolic names associated with files.
2. Related Art
Many computer applications persist information by storing data in files. For example, a software tool that allows a user to create different kinds of data objects may store representations of each into a separate file. However, an application trying to find specific information that has been stored in one or more files faces two problems: (1) searching for data in files becomes time-consuming as the number and size of files increases; and (2) only files with well-known formats can be searched. The first problem can be addressed for the most part by indexing the contents of files: parsing each file once to store the relevant contents of the file into an index, and afterwards searching the index (which presumably provides efficient search characteristics) instead of each file individually. However, this technique also can only be used for files with well-known formats. Without a well-known format, the contents of a file cannot be parsed. A significant hurdle thus faces applications that wish to search or index arbitrary files with arbitrary formats. A trend in software is to support pluggable extensions, which allows clients to provide their own specialized behavior to the software. This trend leads to the use of file formats understood only by a specific extension, or the introduction of extensions-specific data into extensible file formats. Unfortunately, none of the existing systems are capable of treating extensions-specific data on par with file data it is hard-coded to understand.
Additional problems exist in the current art with respect to symbolic names for files. Specifically, consider an interdependent environment, where references are symbolic and not direct, examples of which may include an XML-based system. In such an environment, references between entities are described through symbolic names, where one entity references a symbolic name while another entity associates itself with the same symbolic name. At runtime, the referenced symbolic name is resolved to the respective entity. The advantages of such an environment are clear and documented, ranging from flexibility to pluggability. Nonetheless, one common problem encountered in such a system is the issue of dangling references, where a reference is un-resolvable due to a missing dependency. This may occur when an entity being referenced symbolically is deleted or its associated symbolic name is modified. When the entity being referenced is deleted, its association with the symbolic name is deleted with it. Even if the change was simply a modification of the associated symbolic name, then the association with the original symbolic name is deleted. Accordingly, the referencing entity is no longer capable of associating its dangling reference with the specific entity that was deleted. This means that there is no way to identify to a user of such an environment the reason that caused the dangling reference (e.g. the name of the specific file that was deleted). Looking at the same issue from a different perspective, it also means that when an entity's association with a symbolic name is deleted (i.e. the entity is either deleted or simply associated with a different symbolic name), there is no way to find all referencing entities that are affected after the fact (since they only reference the deleted symbolic name, and not the entity itself).