Many fields of human endeavor now use computing devices. Some of these fields collect and process vast amounts of data. As an example, collected medical data can grow exponentially. A medical facility may attach several sensors to a patient, e.g., heart rate monitor, blood pressure monitor, electrocardiograph (EKG) monitor, blood content monitor, urine analysis monitor, brain activity monitor, various other electrodes, etc. When samples are taken from these sensors at a high frequency, the data storage requirements can become immense.
In the field of telemedicine, a surgeon or physician may operate on or interact with a patient who is located at a distance, e.g., many miles away. To properly operate on or diagnose the patient, the surgeon or physician may need nearly real-time access to the remotely collected data. However, network bandwidth may be insufficient to communicate all of the collected data rapidly from the patient sensors to the physician's computing device.
Some of the collected data can require many thousands of terabytes of data storage space, if not more. It is now commonplace for even home computer users to purchase hard disk drives for personal computing devices that provide a storage capacity of 1 terabyte or more. To reduce the amount of storage space that is needed to store data, various compression methods exist.
Compression methods use a fewer number of bits to store data than the number of bits that represent the uncompressed data. Compressed data can thus require less storage space to store and reduced network bandwidth to transmit the compressed data as compared to the equivalent data prior to compression (“uncompressed data”).
Compression methods can be lossy or lossless. When a lossy compression method is used to compress data, the compressed data generally cannot be used during expansion to reproduce the originally compressed data with complete fidelity. In contrast, when a lossless compression method is used to compress data, the compressed data can be used to reproduce the originally compressed data with complete fidelity.
Different compression methods are more efficient at compressing different data. Two commonly-employed compression methods are symbol-based compression method and run-length encoding (“RLE”) compression method. The symbol-based compression method uses a symbol (e.g., a sequence of bits) to represent a large sequence of bits of data. For example, the symbol “1” can represent “140/90” and the symbol “2” can represent “120/80,” which are two common values for blood pressure readings. When compressing a large set of medical data, “140/90” and “120/80” may occur frequently. The symbol-based compression method may substitute “1” whenever “140/90” occurs or “2” whenever “120/80” occurs.
In contrast, the RLE compression method may be more efficient for compressing data when the data includes long sequences of identical values. As an example, when a set of data that is to be compressed includes the values “11111111122222,” which may be a sequence of periodic readings from a sensor, the RLE compression method may substitute this set of data with “9152” because there are nine “1”s and then five “2”s in the data. However, the RLE compression method may be less efficient than a symbol-based compression method when the data values fluctuate rapidly.
Even among various symbol-based compression methods, the selection of symbols can affect compression performance. In the example of symbol-based compression provided above for blood pressure readings, the data would not be compressed well if a patient's blood pressure readings rarely contained the values “140/90” or “120/80” and there were no symbols defined for other values.