The present invention relates to the encoding of binary data for transfer between computers connected via a network.
While computer networks, such as local area networks, have existed for many years, the problem of encoding binary data for transmission between client computers has become a widespread global consideration since the advent of the Internet and e-mail.
By way of introduction, it is known that a single byte contains up to 8 bits of information, each of which has a value between 0 and 255, as per the IBM EBCDIC (Extended Binary Coded Decimal Interchange Code). It is further known that, in the ASCII system, each of the 127 characters has a numerical code from 0-126, such that, for example, the letter xe2x80x9cBxe2x80x9d has a value of 66, the letter xe2x80x9cDxe2x80x9d has a value of 68, and so on.
Additional characters may be assigned, in an 8 bit system, predetermined numerical values from 127 to 255. However, as some data receiving protocols are based on a 7 bit data system, 8 bit data must be divided into two bytes.
A problem stemming from transmission of data between computers is caused by the fact that all computer files which contain other than the standard 95 ASCII characters, whether these files contain text, image or sound data, must be not only encoded so as to render them transmittable, but also decodable by software resident on recipient computers which may have different operating systems employing different compilers. Examples of some of the different systems used are DOS and Windows 95/98, which are used by PC""s, as well as those employed by Unix and Macintosh computers.
Basically, solutions which exist for preparing binary data for transmission over the Internet all entail the use of encoding routines which, due to the transformation of the data into a decodable binary format, expand the data. The expanded data also requires a certain transmission time corresponding, inter alia, to the volume of transmitted data. Clearly, a certain time is also taken to encode the data at the computer of origin, and to decode the data at the recipient computer, prior to rendering the data accessible thereat.
Currently, two encoding systems are used for facilitating binary data transmission, 3 to 4 byte systems (herein referred to as xe2x80x9c3to4xe2x80x9d), such as the so-called UUencode, xe2x80x9cXX,xe2x80x9d xe2x80x9cMIME64,xe2x80x9d and xe2x80x9cBinHex,xe2x80x9d wherein 3 bytes are encoded into a 4 byte form, so as to have an expansion ratio of approximately 33% in the volume of data; and the xe2x80x9cBtoAxe2x80x9d system, wherein 4 bytes are encoded into a 5 byte form, so as to have an expansion ratio of only 25%.
While the 3to4 systems operate on the basis of a 64 character table, the BtoA system operates on the basis of a table of 85 characters, and is thus more flexible. According to this system, when encoding 4 bytes of data which may be selected from any of the 256 characters of the EBCDIC, there are evaluated 5 bytes, each having a value selected from one of 85 different characters.
As known, both prior art systems evaluate a binary expressionxe2x80x94be this 3 bytes in a 3to4 system or 4 bytes in the BtoA systemxe2x80x94and encode it by multiplying each byte in a selected block by 256 taken to a power corresponding to the position of the byte in the block, and thereafter adding the results so as to achieve a single multiple digit number. This number is then divided successively by the base number (64 or 85) so as to receive a series of remainders or modulos, each of which is stored in a selected byte in a predetermined sequence, thereby to obtain a preliminary encoded information block. In BtoA, so as to render the information contained in each byte in the readable/printable range (32-126), and thereby obtain the final encoded block which is to be transferred to a recipient computer, a value of 33 is added to each byte in the block. Encoding is essentially achieved by performing the above motions in reverse.
It is thus seen that data is both encoded and decoded by time and resource intensive manipulation of the numerical code sequences. Accordingly, while BtoA has a relative advantage over 3to4 systems due to its lower expansion ratio, a disadvantage inherent in the BtoA system is its relative slowness of encoding and decoding operations.
With the advent of the commercialization of the Internet, more powerful and faster computers have become commonly available. However, notwithstanding an increased computing power available, the relative slowness of the above-described known methods of encoding and decoding binary data has become much more noticeable and thus much more of a problem. This problem continues to constitute a bottleneck in data transmission as the transmission bandwidths that become available increase.
Throughout the specification and claims, the term xe2x80x9cbinary unitxe2x80x9d is used to mean any portion of a binary data block, of which a byte is merely an example in which the binary unit has 8 bits.
The present invention seeks to provide an improved method of encoding binary data for transmission between two or more computers, having an encoding/decoding speed at least an order of magnitude greater than the 3to4 system.
There is thus provided, in accordance with a preferred embodiment of the invention, a method of encoding an unencoded block of binary data having a known number of binary units, into an encoded block of binary data having a number of binary units greater than the number of binary units in the unencoded block, for transfer from a computer of origin to a recipient computer and decoding thereat, including the following steps:
evaluating the data contained in each binary unit of the unencoded block, thereby to obtain, for the unencoded data in each binary unit, a primary number and a secondary number;
entering the primary number for the unencoded data in each binary unit into an encoded binary unit of an encoded binary unit block, wherein for the data in each binary unit, the position of the encoded binary unit corresponds to the position of the unencoded binary unit; and
entering the secondary number for the unencoded data in each binary unit into one or more additional control binary units in the encoded block, whereby the secondary number for each unencoded binary unit contains a value and position identifier for the unencoded data contained in each binary unit of the unencoded block.
Additionally in accordance with a preferred embodiment of the invention, the step of entering the secondary number, includes the steps of:
evaluating a control number as a function of the secondary numbers, including value and position identifiers for the unencoded data contained in each binary unit of the unencoded block; and
entering the control number into the one or more additional control binary units in the encoded block.
Further in accordance with a preferred embodiment of the present invention, the primary number for the unencoded data in each binary unit is MOD(B/b), in which B is the data in the binary unit, and b is the encoding base; and the secondary number for the unencoded data in each binary unit is INT(B/b).
Additionally in accordance with a preferred embodiment of the invention, the control number is represented by the following expression:       ∑          i      =      1        L    ⁢      xe2x80x83    ⁢      [                  INT        ⁡                  (                                    B              i                        /            b                    )                    *                        (                                    INT              ⁡                              (                                                      (                                                                  Cnc                        max                                            -                      1                                        )                                    /                  b                                )                                      +            1                    )                          (                      L            -            i                    )                      ]  
wherein
L=number of binary units in the unencoded block,
Cncmax=the maximum number of character codes used for encoding, and
i=position of binary unit in the unencoded block.
Preferably, the system of the invention is a 4 to 5 system, such that L=4, Cncmax=256, 86xe2x89xa6bxe2x89xa695, and INT((Cncmaxxe2x88x921)/b)+1=3.