The physical size of many databases containing technical documentation such as graphics and text is becoming increasingly unwieldy to contain on printed pages. Technical descriptions of many electronic systems for example may include a multitude of binders of printed information. Aside from the physical size of bound printed pages, it is difficult to search for terms or phrases contained within a binder's printed pages. As of late, it has been found to be more practical to contain databases on an electronic storage medium rather than on paper. Typically, readers coupled to video display terminals provide access to the information contained within electronic storage devices.
One common type of storage device currently used for the storage of bulk media such as text is the compact disc read only memory (CD-ROM). Other forms of electronic storage media include hard disk drives, magnetic tape drives and floppy disk drives. CD-ROM discs are often the chosen form of data storage medium as they are convenient, holding hundreds of megabytes of information on a nearly indestructible inexpensive disc. Unfortunately, the speed at which information may be retrieved from a CD-ROM disc using standard off-the-shelf drives is a major limitation; typically, they are much slower than hard drives. On the average, a unit of information can be retrieved from a CD-ROM disc in approximately 1.5 seconds and sequential read operations to retrieve contiguous sequentially stored information takes approximately 0.1 seconds. If information is to be retrieved and the location of the information on the disc is unknown, the entire disc may have to be searched. Searching all of a 650 megabytes CD-ROM disc typically takes longer than 60 minutes.
Schemes are known which attempt to lessen the time to retrieve information from a CD-ROM and other large storage media. Such schemes often provide alphabetized indexes of keywords within a document in the form of a dictionary; pointers are provided to locations in a document where keywords may be found. Some schemes for searching data within a database are specific to the type of data that will be searched. For example, patent databases often have indexing schemes that relate to particular fields within a database. These fields may include assignee, patentee, inventor, and others. Organizing data in such a manner may produce favourable and timely search results, however, the searching index is application specific and information about the type of data being searched must be known ahead of time. It would be preferable to have a more generic method of organizing data wherein the index fields could be used on any textual database data being stored. One scheme for information storage and retrieval is exemplified in U.S. Pat. No. 4,276,597 in the name of Dissly et al. Dissly describes a method and apparatus for identiying particular desired information bearing records having desired predetermined indentifiable characteristics from a set of such records in a base data file. A special retrieval file including arrays of binary coded elements is produced and maintained from the information content of the base data file. While some schemes are better than others, some best suited to particular media, most indexing schemes are costly in overhead. The dictionary of keywords and indexing tables often take up as much or more storage space on the CD-ROM as the document itself. Having a large dictionary and database index also tends to slow the search process as the dictionary and database also have to be scanned. Therefore, the index must be kept as small as possible, and related information should be kept as close together as possible.
It is an object of the invention to improve the time requirement to access data from a data storage medium.
It is another object to provide an improved indexing scheme for the data stored on a data storage medium.