The large improvement in processing power in personal computers and work stations has created the incentive to port to these newer machines many main-frame applications. However, many large applications are on main frames not only because of the processing power needed but because of their ability to access and control large storage devices making them useful for applications that require access to large databases. One such application is the nationwide radio frequency coordination and engineering system owned, operated and maintained by Bell Communications Research Inc. (Bellcore) and which uses the U.S. Geological Survey three second database. This database is needed by this system to produce signal maps and to conduct spectral analysis for the placement of radio receivers and transmitters in a given area; the database is over 9.4 gigabytes. The telephone company users of this system have been required to access the system remotely which is expensive and sometimes presents the users with problems because of vagaries in the performance of the transmission facilities over long distances. As a result, a work station based version of the system is desirable. However, for a work station version of the system to be practical it is necessary to compress the U.S. Survey Geological Database into a size such that it can be stored within a work station in a manner that is conducive to fast and accurate expansion of segments of the data when needed.
The U.S. Geological Survey three second terrain database is a vital component for radio engineering applications used to generate terrain profiles for signal level evaluations necessary for radio transmitter and receiver placements. To port such an application to a work station or personal computer platform places restrictions on the allowable size and structure of the database. These restrictions are as follows: small file sizes, internal memory limits to 640K bytes, relative small computing speed, interactive operation, and assurance of data portability using low density magnetic storage medium i.e. floppy disks.
A terrain profile is an ordered collection of elevation values along a radial. The radial is the shortest path between two points on the surface of the earth; thus it follows the geodesic line passing through both points. This fact imposes the conclusion that, excepting for an extremely small number of cases, no elevation data will be accessed as an individual value, but as a set of data placed along the same geodesic line. The best type of organization for that kind of data would be the square matrix. As a result, the database was organized in records composed of a three minute by three minute square matrix containing 3721 (61.times.61) three second elevation values. The elevations on the boarders of each square matrix are repeated on the neighboring matrix. In this mode any interpolation required for computing the elevation in any point not matching the 3 second by 3 second raster can be done by accessing only one data record. A group of 25 records are enclosed in the same file. A 1 degree by 1 degree square is made from 16 different files. Each file is given the name of the southeast corner coordinates. Each file has a header with 25 entries defining the position in the file where each record is stored.
The record structure of each 3 minute record is comprised of a 2 byte integer representing the smallest elevation value found in that record, a one byte length flag with the value or 1 or 2, and 3721 integer values stored as one or two byte integers, the values of which are relative to the smallest elevation value contained in the record. If the maximum value of the relative elevation is greater than 255, the flag is set to 2 and the values of the relative elevation are represented as a two byte integer; if the maximum value of the relative elevation is smaller then 256, the flag is set to 1 and the values or the relative elevation are represented as a one byte integer. The problem presented by this large database was to be able to compress this data into a form that can be both segmented according to a users specific geographic needs (i.e. users in one state only need the geological data for that state) and can be loaded into a personal computer limited in size as described above.
In general, data compression algorithms are based on the simple idea of mapping the representation of data from one group of symbols to another more concise series of symbols. Two schemes form the basis of many of the data compression algorithms currently known in the art. These are Huffman coding and LZW (for Lempel and Ziv, its creators, and Welch, who made substantial contributions) coding. Both Huffman and LZW coding are lossless compression techniques, meaning they do not lose any information as a result of the compression and expansion process. Huffman coding, originally proposed sometime in the 1950s, reduces the number of bits used to represent characters that occur frequently in the data and increases the number of bits for characters that occur infrequently. The LZW method, on the other hand, encodes strings of characters, using the input data to build an expanded alphabet based on the strings that it sees. These two different approaches both work by reducing redundant information in the input data. Compression by Huffman coding requires that the compressor know or learn the probabilities of each type of data to compress. In order to learn the probabilities, Huffman coding performs two passes over the data requiring temporary storage of the entire data block, which is memory intensive especially for large databases. LZW, on the other hand, works by extending the alphabet using the additional characters to represent strings of regular characters. The key to the algorithm is the establishment of a table that matches character strings with code words representing strings. This table must exist as an index for translating between the stored or transmitted code and the original symbol. The use of such a table is also memory intensive.
Another approach for data compression is disclosed in U.S. Pat. No. 4,796,003 by Bentley et al. entitled "Data Compaction". Bentley et al. discloses an algorithm based on the redundancy of words (i.e. partitioned segments of data). It employs a word list with the position of each word on the list encoded in a variable length code. The shortest code represents the word at the beginning of the list. This list is dynamically created during the compression process. Each word from the data stream to be compressed is compared to the words in the list; if the word is found the variable length code representing the word position is stored instead of the word itself and the word is moved to the head of the list. If the word is not on the list, the word itself is stored and then that word is placed at the head of the word list. This compaction method requires the development and maintenance of a word list separate from the actual data. For expansion, the word list has to be regenerated, which is not conducive for fast expansion of the compressed data.
One object of the present invention is to be able to compress large databases into a size that can be used in work stations. A second object of the present invention is to compress large scale databases without having to generate separate translation tables or word lists. A third object of the invention is to achieve a high rate of compression while still being able to expand segments of the database without needing to have complete knowledge of the database. A fourth object of the invention is to compress the database in a manner that enhances rapid data expansion.