Many software applications include large quantities of resource strings, such as menu labels, feature descriptions, and other character strings that may be displayed in a user-interface to an application. A resource file or files is typically used to store the resource strings for an application or suite of applications. The resource file may be accessed at runtime by other components of the application when a particular string or set of strings is needed for display in a user interface to the application.
Compression and encoding technology may be employed during the build process to reduce the size of a resource file. A reduced file size is advantageous in view of bandwidth and storage constraints that may be encountered when provisioning and delivering an application. For example, a reduced file size may make downloading an application package faster than it otherwise would be. In addition, the reduced file size may require less local storage space once it has been downloaded to a local environment. Compression may be especially beneficial with respect to applications that provide support for language localization as a given menu label or other such user interface item may be described by multiple character strings, each in a different language.
While a variety of compression technologies exist for compressing text files, many are not well suited to compressing relatively short text strings, such as a resource string, because they usually do not exhibit a repetitive pattern. In addition, most compression technologies compress an entire file and then, during decompression, decompress the entire file at once. In contrast, resource strings are decompressed on a per-string basis when a string is needed, as opposed to decompressing an entire source file at that time.
Decompressing resource strings on a per-string basis mandates that a particular resource string be located quickly in a resource file. How strings are named can impact the speed with which they are found. Giving resource strings numerical identifiers in an index allows for fast look-up at runtime, but such identifiers are difficult to maintain over time, especially across multiple development and build platforms. Utilizing resource names may increase ease of use and maintainability, but results in slow look-up times at runtime.
A balance is therefore continuously sought between the storage gains achieved by resource string compression and encoding, and the performance load presented by decompression, decoding, and various naming constructs at runtime.
Overview
Provided herein are various implementations describing enhanced technology for compressing, encoding, and otherwise reducing the size of resource files. In addition, implementations are disclosed related to technology for naming strings and accelerated string location and retrieval. Any particular implementation disclosed below may be considered independently or in combination with any one or more of the other implementations.
In at least one implementation, similarity compression is employed to reduce the size of a resource file. Resource strings in the file are compressed based on their similarity to one or more other strings in the file. The compressed strings are comprised of a similarity value representative of the extent to which a string is similar to another, as well as a remaining portion of the string not represented in the value.
In another implementation, map-less encoding is employed to reduce the number of bytes used to represent a resource string. The high byte of each character in a string is eliminated, while the lower byte is preserved. In some cases, the lower byte may be shifted to avoid overlap with the byte value of another character or characters.
Bit-level compression is employed in another implementation to reduce the quantity of bits used to encode each character in a string. In bit-level encoding, a string-specific dictionary is created. Each character in the string is then encoded based on either its position in the dictionary or its relative position in a range that covers the other characters.
Lastly, resource strings are stored in association with hash values that are generated from the resource names for the strings. A resource strings is retrieved at runtime based on a proportionality relationship between the hash value for the resource string, the total number of possible hash values, and the quantity of strings in a resource file.
This Overview is provided to introduce a selection of concepts in a simplified form that are further described below in the Technical Disclosure. It may be understood that this Overview is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.