The present invention relates generally to the field of data processing and more particularly to a receiver of data from a sender being able to interpret the sender""s native data structure layout.
Since their inception, the basic components of computers are still the same: a computer processor and a memory. A computer processor is the active component of the computer system and manipulates data retrieved from the computer""s memory to process tasks, programs, or processes assigned to the computer system. Computer memory stores information used by the computer and works in much the same way as the memory of a person. For example, just as people memorize lists, poetry and events, a computer system stores words, numbers, pictures, etc. in its several memories. Similarly, specialized hardware within a computer processor reads and interprets information from computer memory analogous to a human reading and interpreting printed words. Moreover, just as the arrangement of words on a page is important to human readers, the arrangement of information in the computer""s memory is important to the computer system. For example, words in English are written from left to right, words in Hebrew are written from right to left, and words in Japanese are written top to bottom and right to left. One arrangement is not better than the other, it is only different. Similarly, bits are arranged in different formats in a computer system and, for the most part, one arrangement is not better than another which permits many approaches to organizing information in computer systems. Computer system designers have thus developed different schemes for organizing computer data.
The basic building block of all computer data is the bit, any number of which, usually a multiple of two, may comprise a byte and any number of bytes, again usually a multiple of two, may comprise a word. In the examples provided herein, a byte of data is eight bits. Four bytes or thirty-two bits of data is a word; a half-word is two bytes or sixteen bits; and a double word is eight bytes or sixty-four bits. One such difference of data representation exists for two particular forms of computer data called floating point information and binary integer information: big endian and little endian. If a binary number is implemented using. four eight bit bytes, little endian presents the low order eight bits first whereas big endian has the high order eight bits first. Illustrated in FIG. 1 are examples of the big and little endian formats of a byte, a half-word, and a word. The decimal integer one hundred twenty four (124) can be represented in hexadecimal notation as 7C and is one byte of computer data. As shown in FIG. 1, there is no difference in the byte order between little endian and big endian formats for the integer 124. The next integer, fifty thousand (50,000), however, comprises two bytes or is a half word. In hexadecimal notation, the little endian format of the integer is 50 C3, whereas in the big endian format, the integer is represented as C3 50. The differences in endian format is illustrated even more vibrantly for larger integers, e.g., one billion, which is a word of data. In little endian format, one billion is represented as hexadecimal 00 CA 9A 3B and in big endian, it is represented as 3B 9A CA 00. The little endian arrangement is used in computer processors by INTEL Corporation were incorporated into IBM PC and compatible personal computers. The big endian arrangement was adopted by other computer systems such as those manufactured by APPLE Corporation having processors designed by MOTOROLA and IBM Corporations.
In the past, the choice of endian format was not a significant problem because computers seldom interchanged data or did so in ways that were not dependent upon binary data formats. As the networking of computers increased, however, the endian problem became more severe because operating systems, programming languages, and computer architectures maintain a preference for a particular endian format. For example, and as is known all too well, persons who used IBM PC or compatible computers with the Intel processors could not generally share computer programs and information with persons who had APPLE Macintosh computers, and vice versa. Large corporations which used both types of computers found it difficult to distribute information among employees. Some businesses, moreover, found that they could not easily share information with suppliers or buyers whose computers did not have the same data format. Consequently, computer software developers devoted additional time and resources to develop multiple versions of the same software to support different types of computer data formats and different computer systems.
The explosive growth of stand-alone computers used in businesses and homes in conjunction with the world wide web now demands that there be compatibility between the different types of operating and computer systems. In today""s computing environment, a program on one machine or in one language may analyze data from another machine or in another language on the same machine. Each machine and/or application program may have not only have a different endian format but also its own peculiar data dependencies. Thus, it is not uncommon for two different computers to want to exchange data over a network wherein the computers have different processors, different data representations and different operating systems.
Today data is transferred through computer networks using formal and rigidly defined protocols. Protocol information is often defined by international standards such as the ISO/OSI protocol stack, CCITT recommendations for data communications and telephony, IEEE 802 standards for local area networking and ANSI standards. Other examples include the TCP/IP protocol stack, defined by the U.S. Department of Defense, plus other military and commercial data processing standards such as the U.S. Navy SAFENET standard, the Xerox Corporation XNS protocol suite, the Sun Microsystems NFS protocol, and compression standards for HDTV and other video formats. In these formats byte order and other features of the data structure layout are fixed. But the problem of data transfer between or among different systems is only exacerbated by these rigid protocols because now data transfer is not only constrained by computer processor formats, operating systems, and programming languages but also data transfer protocols. For many environments, this is not a satisfactory solution.
Endianness, which is just byte order, has already been discussed but actually is only one aspect of data structure layout or data dependency. Some computer languages have abstractions called integers which represent a multi-byte binary number, and pointers which contain a memory address, both of which may vary in size. For example, in C standard, the relational sizes of integers and pointer addresses are presented below:
size of (short)xe2x89xa6size of (int)xe2x89xa6size of (long)xe2x89xa6size of (longlong)
size of (void*)xe2x88x92undefined
In several defacto standards, the sizes of integer types and pointers differ according to the table below:
xe2x80x9cRoundingxe2x80x9d rules comprise yet another local data dependency or feature of data structure layouts. Many languages have data xe2x80x9crecordsxe2x80x9d or xe2x80x9cstructuresxe2x80x9d to indicate a logical, and sometimes physical, aggregation of related data. Some data records have a more favorable bit alignment in a given machine for better performance. Many languages allow the declared aggregation of data items to have extra storage inserted between them to achieve favorable alignment. In C/C++, for example, whether and when to do round up/down is determined by each compiler. Such aggregations are typically called by the C keyword xe2x80x9cstruct.xe2x80x9d
It is possible, especially in a language like C or C++, to have source code that adapts to each of these independent data structure layouts through standard recompilation. For many programs, that is the end of the story. As long as the data stays on the same kind of machine and the programs use the same compiler, differences in byte order, rounding, and the like cause no problem. If, however, the purpose of an application program is to analyze data from a variety of sources, the program must now cope with the wide variety of byte orders, rounding, and integer sizes of various incoming data streams. That is, the size and byte order of integers and pointers will vary depending on the particular machine and compiler chosen. Yet, at the source code level, the data is identical.
JAVA(trademark) is a programming language that blossomed in the mid to late 1990s. The concept of Java was to remove software dependence on individual processors and move into the realm of consumer electronics, that is, despite the fanaticism towards certain computer processors and operating systems exhibited by some in the computer industry, most consumers of electronic devices are indifferent when it comes to which particular processor or operating system is used to operate or function on their consumer electronic device so long as it works reliably and seamlessly. Java has also flourished with burgeoning expansion of network computer technology and the rise of the world wide web browser technology on the Internet. Java applications can be written on one computer and transferred over a network, such as the Internet, to any device containing a computer with a Java interpreter regardless of the operating system or the processor in that machine.
Because of Java""s independence from a particular processor or operating system, it straddles the endian problem. C/C++ sets endianness at compile time and the endian of the underlying computer is visible to the programmer. A Java interpreter can run in whatever endian format the processor supplies, however, the Java virtual machine presents the illusion of a big endian machine. In a Java virtual machine, the processor""s endian ordering doesn""t matter so long as it presents the data in proper format for arithmetic operations. Another reason Java is endian independent is that in order to freely but securely exchange code and information among the electronic devices in the network, pointers in memory are excluded to eliminate the possibility of malicious programs accessing arbitrary addresses in memory. Java doesn""t allow certain casts, i.e., it doesn""t allow storage to be reinterpreted. Yet another reason that Java provides a big endian format is that in key interfaces relating to data input stream and data output stream the data is always presented an element at a time in big endian order on external media.
Many programs, nevertheless, follow the C/C++ policies and so data in either endian format and with varying rules for integers and structure padding may appear on external media or be transmitted xe2x80x9cas isxe2x80x9d over a network. Thus, there exists a need in the computer industry for a computer tool which enables correct interpretation of data structure layouts, such as byte ordering, integer/pointer sizes, and padding rules, of incoming data which may be different than the computer to which the data is input and which adapts to processor formats, operating system preferences, and programming language idiosyncracies.
Thus, an embodiment of the invention may be described as a method of computing which comprises the steps of receiving a first data stream in a sender""s native data structure layout from a sender, reading a prefix word of the first data stream, and from that prefix word, deriving the sender""s native data structure layout. The method may further comprise the step of dynamically reconstructing data of the data stream based on the prefix word in a receiver. The step of deriving the native data structure layout from the prefix word may further comprise determining the endianness of the data stream based on the prefix word and/or determining the existence and size of at least one integer type in the data stream and/or determining the existence and size of pointers and/or determining padding rules/byte alignment of the data stream and/or determining bit alignment. When determining the existence and size of at least one integer type, the method further comprises determining the existence and size of short integers, integers, and long integers. The prefix word interpreted as a byte array may be of the format FFyyxxFE or FExxyyFF in which FFyyxxFE indicates that the sender""s native data structure layout is of one endianness, preferably little endian, and FExxyyFF indicates that the sender""s native data structure layout is of the other endianness, preferably big endian.
Another embodiment of the invention contemplates a method of computing, comprising the steps of receiving a first data stream in a sender""s native data structure layout from a sender, reading a prefix word of the first data stream wherein the prefix word has a format FFyyxxFE or FExxyyFF in which FFyyxxFE indicates that the sender""s native data structure layout is of one endianness and FExxyyFF indicates that the sender""s native data structure layout is of the other endianness, deriving the sender""s native data structure layout from the prefix word, the sender""s native data structure layout further comprising the existence and size of short integers, integers, and long integers, the existence and size of pointers, padding rules/byte alignment of the data stream, bit alignment, and dynamically reconstructing data of said data stream based on said prefix word.
Another embodiment of the invention contemplates a computer system, comprising a sender central processing unit (CPU) processing a data stream in a native data structure layout, a prefix word generator to encode the native data structure layout in a prefix word and attach the prefix word to the data stream. The computer system further comprises a receiver CPU to receive the data stream with the attached prefix word and a communication system connecting the sender CPU and the receiver CPU over which the data stream is transmitted, a prefix word decoder to decode the prefix word and dynamically adapt the data so the receiver can use the data in the data stream. Preferably, the prefix word generator is in the sender CPU.
Yet, another version of the invention may be considered a computer system for the transfer of data, comprising an application means to generate data in a first computer having a native data structure layout, a means to create a prefix word encoding the native data structure layout, a means to append the prefix word to the data, means to transmit the appended prefix word and the data over a transmission network, a means to receive the appended prefix word and data in a second computer, and means to read and decode the prefix word and adaptively reconstruct the data based on the prefix word.
It is further contemplated that an embodiment of the invention may be considered a method to interpret computer data which dynamically adapts to an unknown data structure of the computer data by interpreting a prefix word in which the data structure is encoded.