In many digital communication networks and storage systems, data is transmitted or stored in packets. A packet is a formatted unit of data. In order to facilitate a receiver device or reader device to organize and locate the received data, each packet is assigned with a sequentially incrementing identification (ID). A packet may include two kinds of data: control information and user data (also known as payload). The control information provides data the network needs to deliver the user data, for example: source and destination addresses, error detection codes (e.g., checksums), and sequencing information. Typically, control information is found in packet headers and trailers, with user data in between. Different communications protocols use different conventions for distinguishing between the elements of the packet and for formatting the data.
In one technique for reliable transmission, Transmission Control Protocol (TCP) uses a sequence number to identify each byte of data. The sequence number identifies the order of the bytes sent from each computer so that the data can be reconstructed in order, regardless of any fragmentation, disordering, or packet loss that may occur during transmission. For every payload byte transmitted the sequence number must be incremented. In some embodiments, computers that are communicating exchange an initial sequence number (ISN). This number can be arbitrary, and should be unpredictable to defend against TCP Sequence Prediction Attacks. For example, if a computer sends 4 bytes with a sequence number of 100 (conceptually, the four bytes would have a sequence number of 100, 101, 102 and 103 assigned) then the receiver would send back an acknowledgment of 104 since that is the next byte it expects to receive in the next packet.
The packet ID is usually protected by some checksum such as a cyclic redundancy check (CRC) or error-control code (ECC) such as Reed-Solomon (RS) code. Specialized forms of Reed-Solomon (RS) codes, specifically Cauchy-RS and Vandermonde-RS, can be used to overcome the unreliable nature of data transmission over channels. The encoding process assumes a code of RS(N,K) which results in N codewords being generated, whereineach codeword has a length N symbols each storing K symbols of data that are then sent over a channel.
Any combination of K codewords received at the other end is enough to reconstruct all of the N codewords. The code rate is generally set to ½ unless the channel's erasure likelihood can be adequately modeled and is seen to be less. In conclusion N is usually 2K, meaning that at least half of all the codewords sent must be received in order to reconstruct all of the codewords sent.
For example, data on a DVD is grouped into sectors of 2048 user bytes. Each sector is assigned a 4-byte Physical Sector Number (PSN). Each PSN is protected by an RS(6, 4) Reed-Solomon code, which has the capability to correct one byte in error or detect 2 or more bytes in error. The 6 bytes of PSN+ECC are stored at the beginning of each sector. However, miscorrections may occur due to corrupted data or disc seek operations may cause jumps in the packet identifier sequence, which makes it problematic to determine the actual packet identifier in an efficient manner.