The advent of network-based computer systems has increased dramatically the amount of data transmitted daily from user to user. Data which previously was transmitted by paper or voice systems can now be transmitted more efficiently by digital systems. A wide variety of data is capable of being transmitted in digital form, including text-based items such as electronic mail messages or electronic funds, graphics, sound and video.
The volume of digital data transmissions is growing annually and will continue to increase in the future as network and telecommunications systems become more popular in business and personal use. An increase in the installed base of computers will result in digital data transmissions becoming a major conduit for both business and personal transactions. The volume of transactions involving digital data transmissions will equal or supersede analog and other traditional data transmission methods such as voice or writing systems.
The increase in volume of digital data transmissions has created a concomitant increase in the volume of digital data that must be archived for both storage and retrieval. Yet despite this increase in digital data transmissions, no viable method currently exists to efficiently store and retrieve the data. The explosive growth in the volume of data required to be archived has created problems that current storage and retrieval methods cannot address, such as the need to capture large amounts of data, store and index that data in files, and organize the indexes in a manner which permits the high speed retrieval of the data from the files.
The concept of storing and retrieving digital data is well known. First, data is captured by an appropriate method and filed on media used for storing information. Second, the data is indexed by cross-referencing various attributes of the data with the storage location of the data. The index is normally collected in a database with appropriate fields for the data attributes and storage location. The database enables the fields to be manipulated to facilitate searching, sorting, recombination and similar activities.
Previous methods of storing and retrieving digital data transmissions were based on the use of paper, celluloid or computer media. None of these traditional storage and retrieval systems provide a rapid, efficient and reliable method of capturing the data, storing the data on appropriate media and accessing the data for future operations.
A popular method of storing and retrieving digital data transmissions prints the data on paper and stores it. The paper is sorted or collated, typically in paper folders. Next the paper is indexed manually into a database. Finally, the paper is stored in a filing cabinet.
Paper-based storage and retrieval systems do not offer a solution to the problem of creating a fast, efficient and reliable method of storing and retrieving digital data transmissions. Paper-based systems are material and labor intensive. The costs associated with paper, folders and filing cabinets increases at an uneconomical rate as the volume of digital data transmissions increases. The costs associated with providing personnel to convert the data, maintain the storage system and operate the index system are also high, with more personnel required to maintain both the storage of paper documents and the retrieval of those documents for future reference. Finally, any indexing scheme associated with a paper-based system does not permit rapid access to the data, as the index must be physically separated from the data. Both computer and written database indices in a paper-based system require a user to access the index, retrieve the appropriate record, and, by a separate process, match the record with the corresponding data storage location to physically retrieve the data.
In a celluloid-based storage and retrieval system, users make celluloid copies of digital data transmissions. A celluloid-based system is similar to a paper-based system, merely substituting celluloid media such as microfilm or microfiche instead of paper as the hard copy document capable of being stored and retrieved. In fact, most celluloid-based systems use paper as an intermediate transfer form: the digital data is first converted to paper form, then the paper is imaged onto the celluloid.
Celluloid-based storage and retrieval systems do not offer a solution to the problem of creating a fast, efficient, and reliable method of storing and retrieving digital data transmissions. Celluloid-based systems offer lower storage costs than paper-based systems, as less physical space is required to store the data. However, these cost savings are neutralized by the higher material cost per unit for celluloid such as microfilm and microfiche. In addition, imaging equipment required to convert the data from either digital or paper form to celluloid form is overly complex and expensive. The indexing scheme for a celluloid-based system presents the same inadequacies as the paper-based system, incurring labor and material costs and suffering the disadvantage of being physically separate from the stored data. Finally, retrieval of data stored in celluloid form is cumbersome and expensive, requiring sophisticated readers capable of both lighting and magnifying the data to a user-readable format.
The most advanced method of storing and retrieving digital data transmissions involves the use of a traditional computer system. Computer-based systems receive the data for storage and retrieval either directly from a computer platform, or indirectly by transferring the data to an imaging system and then transferring that image directly into the computer-based system.
Direct transfer of the data to a computer-based storage and retrieval system is a common method of manipulating digital data transmissions. The data is merely copied or moved from its storage location resident on one computer system to a storage location resident on another computer system.
Indirect transfer of data through the use of a document imaging system is another common method of manipulating digital data transmissions. A document imaging system converts hard copy documents to digital images that can be accessed and viewed on a computer workstation, stored on media, transmitted across computer networks, incorporated into software applications or printed. These systems typically utilize a scanner to digitize the images, an imaging file server computer to manage access to the images, a display device to view the images, a storage system to store the image such as magnetic disks, tapes or optical drivers and a printer to reproduce the images.
Traditional computer-based storage and retrieval systems do not offer a solution to the problem of creating a fast, efficient, and reliable method of storing and retrieving digital data transmissions. The transfer of data, either directly from one computer platform to another, or indirectly by transferring the data to a hard copy document and then converting the document to digital form, is a time consuming and expensive process. Also, data stored directly is inadequate as a system of archiving large volumes of digital data transmissions because the stored data cannot be readily manipulated. A direct transfer of digital data from one computer to another merely deposits the data in some storage location on the new computer system. No specialized index is automatically created to relate this data to its storage location other than the rudimentary filing system used by the computer.
In addition to the individual inadequacies of the traditional paper, celluloid, or computer-based digital database storage systems, all of the current systems possess common problems with their indexing scheme. If the index is manually written and operated, retrieval times are excruciatingly slow, as an operator must examine the index, locate all records relevant to a retrieval request, retrieve all of the relevant stored data and prepare the retrieved data for re-transmission to the user.
Similarly, if the index is created using a computer, existing database architectures such as a flat file architecture, a hierarchical architecture, a relational architecture and an object-oriented architecture require large amounts of data to be searched for each retrieval request.
The simplest index system uses a flat file database architecture. This architecture represents a simple file which associates the data to corresponding information on index keys. A flat file database stores, organizes, and retrieves information from one file at a time. All data or records within these files must be accessed sequentially. Thus, to read or store the last record, all previously stored records must be read or accessed first.
A more sophisticated index system uses a hierarchical database architecture. A hierarchical architecture groups records in an interrelated, tree-like structure. A hierarchical architecture descends from a main, or root, data field. Each successively lower-ordered data field is a subsidiary that branches from the higher-ordered data field. Every data field except the root can contain either higher or lower-ordered data fields. Records in a hierarchical database can be stored with a variety of index keys to enable easier reference and faster access to desired data by focusing on relevant structures within the database.
Another more sophisticated index system uses a relational database architecture. A relational database stores information in tables, or rows and columns of data. In a relational database, the rows of a table represent records, or collections of information about separate items, and the columns represent fields or particular attributes of a record. In conducting searches, a relational database matches information from a field in one table with information in a corresponding field of another table to produce a third table that combines requested data from both tables.
Finally, recent computer index systems have used an object-oriented database architecture. An object-oriented architecture allows the modeling of complex data sets by incorporating a computer program into the index. To search the index, the program reads an object, operates on that object, selects the data, and places the object back into the database.
The existing database architectures each possess flaws. Flat file systems are adequate to manage small sets of data with few requirements for complex searches or queries. However, flat file systems become increasingly cumbersome as the size of the database grows and the demand increases for more flexible queries arises. Hierarchical systems are well-suited for organizing information that breaks down logically into successively greater levels of detail. Random or complicated data structures, though, cannot utilize the features of a hierarchical system and search times become increasingly slow. Relational systems allow the rapid retrieval of small sets of data, yet become cumbersome when attempting to manipulate large amounts of data. Finally, search and retrieval operations in object-oriented systems are too complicated, requiring the use of experienced computer programmers to write and execute programs specifically designed for each individual search.
Also, computer applications using the traditional index systems place the entire index to the database alongside the stored data in single location, such as on a single disk drive or single database server. As the volume of data stored by the system increases, the database index similarly increases in size. A large index is unmanageable and does not permit rapid access to the stored data. Thus, search times for a traditional database architecture to write a particular item of data within an index that is many terabytes in size can take several minutes. The use of a system designed to process tens of thousands of requests a day would not be feasible if individual searches require several minutes of access time.
For the foregoing reasons, there is a need for a fast, efficient, and reliable method for storing and retrieving digital data transmissions. Such a method would confine the advantages of directly storing data from one computer system to another computer system with an indexing system that allows the rapid search of data yet does not become more cumbersome to manipulate with each successive stored record.
A viable method for storing and retrieving digital data transmissions would possess the following capabilities: the ability to receive, examine and transfer transmissions of data into storage with a minimum amount of material or labor; the ability to automatically create an index keyed to certain data fields; the ability to use the index in such a manner that retrieval times are held constant independent of the volume of stored data; and the ability to automatically transmit the requested data to a user.