Organizing and referencing large amounts of data has been a problem which has faced mankind for centuries. With the development and wide-spread availability of computers, the task of processing, organizing and referencing these large amounts of data has been dramatically simplified.
Significant computer power is often required to organize and process some particularly large or complex data structures. Conventional computing systems typically include a high-speed, central processing unit (CPU), read-only memory (ROM) containing the computer's basic input/output system (BIOS), random access memory (RAM) for temporarily storing data and executing programs, and usually some type of memory storage media such as a hard disk drive on which programs and data are stored when not in use. Further, the computing system may contain extended RAM, arithmetic processors, various levels of cache memory and other system accelerators all designed to enhance the computer's computational and processing capabilities. Additionally, the computing system may contain various expansion slots and ports for communication with peripheral devices such as monitors and printers. The overall computing system, therefore, has the capability of receiving data, storing or retrieving data, processing data through the application of various algorithms and programs, and outputting data.
On a typical computing system, data can be permanently stored on memory media such that the data is not lost when the system is powered down or not in use. The most common form of permanent storage is through magnetic storage media such as magnetic disks. Magnetic disks are generally available either as a hard disks or removable "floppy" disks. Floppy disks are desirable because they are transportable, inexpensive and capable of use with a variety of systems. However, floppy disks generally lack the large storage capacity of their hard counterparts and require comparatively more time to store and retrieve data. Hard disks, on the other hand, can store and retrieve data quickly and have the capacity to store large amounts of data. However, hard disks are generally not portable.
Despite their differences, both types of magnetic disks utilize a common notation for storing data and a similar system for filing the data so that it can be retrieved. The language in which all data is recorded on the magnetic disks is binary, which is simply an arrangement of 1s and 0s. The surface of any magnetic disk is divided into microscopic areas which can be altered so that the areas represent either the character 1 or the character 0. Each character is referred to as a bit. Eight bits comprise a byte. Additionally, magnetic disks are also divided up by larger areas. These larger areas allow the computing system to process and organize data in an orderly manner. Magnetic codes are embedded in the surface of the disks to divide the surface up into sectors and tracks. The number of sectors and tracks that fit on a disk determines the disk capacity. Further, sectors may be designated as elements which comprise clusters. Clusters are logical units of memory which vary in size from a single sector to many sectors combined in sequence. Two or more sectors which comprise a cluster must be physically adjacent on the memory media.
However, once data is processed and organized on a storage media, the same level of computational power and ability of the computing system may no longer be required. In other words, when a user wants to perform a simple task, such as browsing a previously organized database, this task may be accomplished with a much smaller, and typically less expensive, computing system with little decrease in performance. Accordingly, one object of this invention is to provide a method for preparing data on a first computer for use on a second computer, wherein the second computer is a comparatively inexpensive and provides only simple, rudimentary computing capacity including reading, translating, and presenting data.
Another characteristic common to conventional computing systems having data storage media is the binary form in which data is written to the database. As mentioned above, all data must be represented in some binary form to be manipulated by computers. Most computers utilize ASCII as the standard code for representing characters as binary numbers. In this form, a binary number containing eight digits is used to represent each character. Therefore, eight bits, i.e., one byte, of memory is required to store each character in ASCII form. However, data can be represented in other encoded forms, such as binary coded decimal (BCD), which require fewer bits to represent each character. A memory space savings can be realized by converting data to a form which requires fewer bits. Therefore, by constructing and utilizing such other forms of data representation, a smaller portion of memory is required to store a finite quantity of data than would be required if the data were written in ASCII form.
Another characteristic common to conventional computing systems utilizing a file system is that the computing systems often allocate entire database files to pre-allocated units of memory even though the file does not require the full amount of space allocated. As mentioned above, two or more adjacent sectors can be designated as a cluster. The memory capacity of the memory media dictates in part the degree of clustering, such that in some computing systems, a cluster represents the minimum logical unit of storage for the memory media. Therefore, it is possible that even though a file may have a size of only 1 byte of data, an entire cluster which is made of many bytes may be allocated for storage of the file. This type of data storage is inefficient since it may not utilize the full memory space available in each cluster. Although this practice is suitable for systems with large amounts of memory space, it is not a desirable practice for systems which have very limited amounts of memory space.
In this same vein, conventional computing systems often write a large file's data to several clusters located in different areas of the database, while maintaining a record of the logical order of the clusters. Physical addressing is a technique which specifically addresses a particular track and sector of the memory media. Logical addressing is a method wherein the computing system writes single files to multiple clusters located in separate, non-contiguous ares of the database and tracks the files by the address of each cluster. Therefore, when a logically stored file is retrieved, the read/write head of the system must continually jump between clusters. These jumps are time consuming and inefficient because the head must physically move across the disk to retrieve a file. It would be much more desirable to devise a method of data storage which utilizes contiguous physical files such that a read/write head could retrieve an entire file without the need for jumps.
Still yet another characteristic present in many databases is the use of fixed length fields to store data. A common method for storing data is to allocate a predetermined number of bytes in the database for each character string. The size of each memory space is determined by the length of the longest character string, such that each character string is allocated the same amount of space. In essence, the database is divided up based on the number of character strings into equal segments, with the longest character string determining the length or number of bits to be present in each segment. The character strings are written to the predetermined locations for each segment even though the character string does not require all of the bits assigned to its particular segment. This method is inefficient in that it does not utilize all the available memory space if a character string requires less than all the bits allocated. Since the length of each segment is fixed and each segment is located at a predetermined address, the bytes which are not used are essentially wasted. Therefore, it is desirable to utilize a method of data storage which is dynamic in its allocation of memory space for character strings within a particular field. The size of each space for character strings within a field should be a function of the individual character strings.
Another technique often employed by conventional computing systems is the use of file system mapping to keep a record of both the location of a file as well as the location of individual clusters making up the file. All of this data is usually contained in a single fixed location in the database, typically at the beginning of the database, in a file allocation table (FAT). The FAT is where the information about the disk's directory structure and what clusters are used to store files. Each time a file is read, the operating system must first move the read/write head to the database's FAT to determine in which clusters a preexisting file begins and the address, i.e., track and sector, of that cluster. If the clusters of a file are not adjacent on the same track, the read/write head must move back to the FAT each time an additional cluster in the file is to be read. Moving back to the FAT in this manner to determine the subsequent address of a cluster is time consuming and inefficient. It would be desirable to provide a method of data storage that minimizes the use of a file allocation table by including address data concerning a file within the file itself, precluding the need to refer back to a central file allocation table.
In computing systems which utilize some or all of the above mentioned hardware and data management techniques, it is generally recognized that read times and seek times are directly proportional to the expense of the computing system, such that the more expensive the system, the faster the data can be addressed and retrieved. However, as the cost of the computing systems decreases, data addressing and retrieval efficiency is also sacrificed. This is especially true of rudimentary computing systems which are utilized with large databases.
Therefore, the need exists for a method and a system in which a computer of greatly reduced capability can utilize large amounts of data in an already processed and organized form in order to present the data in a simple, efficient, and highly useful form. While it is obvious that a computer with normal computational capabilities could also easily achieve this task, such a method and system would provide a much more efficient and cost effective way to accomplish the information dissemination task by pairing a single high level computer (used for generating the required data base through processing, encoding, compressing, organizing, and referencing data) with multiple, inexpensive and simple rudimentary computers capable of reading, translating, and presenting the data.