1. Field of the Invention
The method and apparatus of the present invention relates to the organization of databases. More specifically, the method and apparatus of the present invention relates to the organization and identification of database files derived from textual source files which form the database and the information contained within the database files for optimum retrieval and storage efficiency of textual files.
2. Related Applications
This application is related to U.S. patent application Ser. No. 07/500,141, filed Mar.27, 1990, entitled "Method and Apparatus for Searching Database Component Files to Retrieve Information from Modified Files", U.S. patent application Ser. No. 07/500,138, filed Mar. 27, 1990, entitled "User Extensible, Language Sensitive Database System" and U.S. patent application Ser. No. 07/500,140, filed Mar. 27, 1990, entitled "Locking Mechanism for the Prevention of Race Conditions" which are herein incorporated by reference.
3. Art Background
A database is a collection of information which is organized and stored in a predetermined manner for subsequent search and retrieval. Typically, the data is organized in such a manner that the data is indexed according to certain parameters and can be retrieved according to those parameters. Data contained in databases vary according to the applications. For example, a database may contain information to index words in a text file such that words or string of words in the text file may be retrieved quickly.
The data contained in the database may be organized in a single file or multiplicity of files for access and retrieval. Sometimes the potential for duplications of files occurs because of the nature of the source information from which the database is derived. Thus, if the source information contains duplicate information the database may similarly contain duplicate information. One application where this occurs is in the environment of computer program compilers and processes which assist in the indexing and retrieval of source file information in text form according to certain compiler information generated during the process of compilation of the source file.
For example, software developers frequently needs to review specific lines or sections of a source code program in textual format that contains a certain variable or symbol (hereinafter referred to collectively as "symbols") in order to determine where in the program the symbol occurs and how the value of the symbol changes throughout the execution of the program. One method to provide this capability of search and retrieval is to form a database which contains an index of all the symbols in the source program and the corresponding line numbers in the source files where these symbols appear. However, a source program may be quite large and span not one but a multiplicity of separate files, whereby the files are combined during the compilation process by linking or include statements (such as the "# include" statement in the C programming language) located in the main program. Thus, those files which are frequently used will be included in the database multiple times even though the information contained therein is the same.
There is also a need to insure that the database component files which comprise the database match the current version of the source files from which the database component file is derived. The user may inadvertently modify the textual source files from which the database is derived without updating the database component file. Thus, the database may provide incorrect information for the retrieval of text from the source file.
In a multitasking environment, multiple processes or devices may access or attempt to access files simultaneously. A race condition occurs when one process or device attempts to read or write information to a file while another process or device is attempting to write information to the same file. This results in corrupt data being written into the file and/or corrupt data being read out of the file.