1. Field of the Invention
This invention relates to the organization of computer data and more particularly to a method and system for translating between data structures used in different computer architectures. The present invention enables one computer to read and reorder data bytes which are generated by another computer system using a differing byte ordering scheme.
2. Description of the Background Art
One of the most ubiquitous problems in data interchange between heterogeneous computer systems is breaking up larger quanta of data into smaller quanta of data, and conversely, assembling larger quanta of data from smaller quanta. Data contained in a data stream is generally arranged as either big-endian, little-endian, or some hybrid of the two. In a "big-endian" (from the description "big end") arrangement, the most significant unit of a data word is transmitted first, followed by units of descending radix value until the least significant unit is transmitted. It should be noted that a unit is commonly defined as some number of bits or a byte (where a byte is eight bits) and that a word is some number of bytes. Big-endian sequencing is motivated in part by the western tradition of reading written text from left to right. Since the most significant unit of a number read from left to right is encountered first, transmission of numbers from left to right as they might appear on a display terminal or a printed page is a natural sequencing. A second motivation for the big-endian sequencing scheme arises from the efficiency associated with transmitting the most significant unit first. From the standpoint of transmitting the most amount of information in the shortest amount of time, it makes sense to transmit the most significant data first, since gross decisions can be made based on order-of-magnitude information contained in the high-order bytes. Certainly in many mathematical operations, some processing can occur on the most significant components of a data stream even as the processor is waiting for the lower end bytes of the data components to arrive. The line of Macintosh computers (manufacture by Apple Computer Company of Cupertino, Calif.) uses the big-endian data structure.
Little-endian data packing, derived from "little end", is the converse of the big-endian scheme. The least significant unit is transmitted first, followed by units representing values of increasing numerical significance. A motivation for the use of the little-endian scheme is that the data is organized conveniently and logically as a function of increasing radix. The first value transmitted is 20*n, the second value is 21*n, and so on, where n represents the number of bits per unit (n=8 when the unit is a byte). This sequencing is particularly useful since all addition functions require a carry calculation on the least significant bits before higher-order bits can be processed.
Other organization systems commonly combine features of the big-endian and little-endian conventions to produce hybrid data packing schemes. For purposes of comparing various hybrid structures, it is useful to define the big-endian scheme as a sequence of bytes arranged as:
Big-endian: 0 12 3 4 5 6 7 PA1 Little-endian: 7 6 5 4 3 2 10 PA1 PDP-11 little-endian: 10 3 2 5 4 7 6. PA1 VAX little-endian: 3 2 10 7 6 5 4.
where 0 represents the most significant byte of an eight byte word, and 7 represents the least significant byte. Similarly the little-endian structure would be represented as:
The number 7 represents the most significant byte of the an eight byte word, and 0 represents the least significant byte. A single byte is alternatively defined as a "Word8", a 16-bit quantity as a "Word16", a 32-bit word as a "Word32", and so on, to simplify the forthcoming discussion. The PDP-11 (formerly manufactured by Digital Equipment Corporation of Maynard, Mass.) using the same representations of most and least significant bytes (Word8's), is represented as:
The VAX computer line of computers, also manufactured by Digital Equipment Corporation, is represented as:
A pure big-endian packing mechanism packs Word8's into Word16's in a big-endian manner, that is the most significant Word8 is packed first, then the least significant Word8 is packed. Likewise, Word16's are packed into Word32's in a big-endian manner, Word32's into Word64's, in a big endian manner, etc. A pure little-endian packing mechanism packs Word8's into Word16's in a little-endian manner, starting with the least significant Word8 and followed by the most significant Word8. Similarly, Word16's are packed into Word32's, Word32's into Word64's, and so on, in a little-endian manner. The PDP-11 structure shown above, packs Word8's into Word16's in a little-endian manner, but packs Word16's into Word32's and Word32's into Word64's in a big-endian manner. The VAX, which succeeded the PDP-11, packs Word8's into Word16's and Word16's into Word32's in a little-endian manner, but packs Word32's into Word64's in a big-endian manner. The motivation for this change in structure between the PDP-11 and the VAX, is that at the time of the PDP-11, 32-bit data types were not supported by the hardware, and during the time of the VAX, 64-bit data types were not supported by the hardware. Currently, 64-bits is the largest primitive data type supported on most computers, although this will likely change in the future.
The major problem associated with the various arrangements of data strings used by different computers, is that communication between these computers is extremely cumbersome at best, and impossible in the normal course of network communication. What is needed is a method and system for efficiently translating data from a known-endian arrangement to an alternative-endian scheme.