Digital data is typically transmitted from some type of transmitter to some type of receiver. Transmitters typically include an encoder that encodes the data for transmission; and receivers typically include a decoder that decodes data that it receives. There are different types of digital data such as video data, audio data, audio/video data, text data, computer executable program data, archival data, database information, and the like. When digital data is transmitted, it is typically transmitted in some type of channel. Equivalently, computer memory or any storage device or storage medium can be considered a transmission channel for purposes herein.
When digital data is transmitted, it is important to be able to find specific points within the data in the channel. This is done for various purposes, such as to locate points that enable recovery from errors or losses in the transmission of the data through the channel, points that enable starting the decoding process at a location other than the start of the entire stream, or points that enable searching for different types of data that are utilized for different purposes. Thus, for example, on the decoder side, decoders and other components that process digital data often need to know the context of the data so that the data can be properly processed. This would not be so important if one was able to start with the first bit that was sent and the decoder was able to run without any errors. In this situation, ideally, the decoder could simply track the information that was being sent according to knowing what the format of the data is. Unfortunately, this idealistic situation often does not occur. Errors and other contingencies do occur that present challenges to those who design and use systems that transmit and receive digital data. In some cases such as when tuning into an ongoing broadcast stream of data, the decoder cannot start at the beginning of the data transmission. Locating points by data format parsing may also require a significant amount of complex processing in a decoder.
In many types of channel environments, such issues are addressed by providing, in the data, so-called resynchronization markers. Resynchronization markers provide a mechanism by which a system can start its decoding process or recover from an error. For example, when digital data is streamed as a series of bits or bytes, having resynchronization markers in the stream can provide a decoder with a point of reference from which to recover in the event an error occurs in the transmission.
One way that resynchronization markers can be employed is in the context of start codes. A start code is a string of bits or bytes having a specific value. Generally, many systems tend to carry bytes (e.g. H.222.0/MPEG-2 Systems), so that start codes can be defined as a uniquely-valued string of bytes. The unique string of bytes provides a pattern the presence of which indicates a resynchronization point. A resynchronization point typically indicates the start or boundary of some independently decodable amount of data. For example, in H.262/MPEG-2 Video data, resynchronization points can indicate the start of a slice (i.e. an independently decodable region of a picture), the start of a picture, the start of a GOP (i.e., “Group of Pictures” or independently decodable sequence of pictures), or the start of a new video sequence. Digital video streams can also include so-called ancillary or supplemental data which can be preceded by a start code.
Sometimes, start codes are used not only within a data stream such as a video stream, but are used by a system's multiplex level. The H.222.0/MPEG-2 System specification is an example of a system that uses start codes, and carries streams of video data interleaved with system-level information and audio information.
Since start codes can be important insofar as providing resynchronization points within a data stream, it is a good idea to avoid emulating start codes in the data stream in places that are not, in fact, intended to represent start codes.
For example, consider the following. Start codes define a specific pattern of bits or bytes that can identify the start of a new unit of data. If one is sending arbitrary data in between the start codes, then it is possible that the arbitrary data may, in and of itself, contain the same pattern that one is using as a start code. For example, if one assumes that the data that is being carried is completely random, then if a start code is K bits long, the probability of accidentally emulating the start code in the bits starting at some particular bit location is ½k.
In some cases, the judgment can be made that if the number of bits in the start code is large, then it may be fairly unlikely for the start code to be accidentally emulated. In such a situation, if the consequences of an accidental start code emulation are not too severe, it may be judged unnecessary to take measures to ensure prevention of accidental start code emulations. This is the case with respect to some audio data formats. Typically, these formats do not utilize a very high bit rate measured in bits per second, so it is not too likely that a start code will be accidentally emulated during any particular interval of time. With respect to video data, this is generally not the case, as the bit rate is ordinarily much higher for transmission of video data.
In past major video coding standards (with perhaps one exception), the video syntax format within the data payload has been designed to avoid start code emulation. That is, if one knows what kind of data elements will make up the video syntax, then one can carefully design the syntax so that no accidental start codes can occur. For example, a start code in traditional video coding standards begins with a long string of 0-bits, followed by a 1-bit. This long string may contain 23 0-bits followed by one 1-bit. Assume that most of the data that is sent is entropy coded using variable length codes (often referred to informally as Huffman codes). Variable length codes (VLCs) are defined for example purposes herein as variable-depth tree-structured codes that are utilized to select among a set of represented symbols. One technique using binary-tree VLCs is to make sure that the path in the tree from the root to every leaf that represents a valid symbol always has a “1” in it somewhere, and that the tree structure is not too deep.
Thus, for example, if one knows that every variable length code string is no longer than 10 bits long and that every such string will have at least one 1-valued bit in it, then one knows that there is no way that a sequence of coded data from the VLC can ever contain more than 18 consecutive zero-valued bits. That is, the worst-case scenario would be 1000000000 followed by 0000000001. Thus, if one designs the syntax carefully and inspects the location of every 0- and every 1-valued bit to ascertain how many 0's can occur in a row, one can use a start code that contains a longer string of 0's than can ever occur in the syntax. For example, the syntax can be designed so that valid syntax can never contain 23 0's in a location that is not a start code. Thus, every occurrence of 23 0's should be a start code and the decoder should be able to accurately detect start codes.
While the above-described operation appears straightforward, the operation can be a fairly difficult undertaking because one has to inspect all of the possible data (at the bit level) that is going to be sent, in every possible order in which it is going to be sent to ensure that a start code pattern cannot accidentally be sent. This is an arduous method of syntax design that is prone to mistakes.
This bit-level inspection design process describes, generally, the way that many video coding specifications have been designed in the past (i.e. H.261, MPEG-1, H.262/MPEG-2, most of H.263, and MPEG-4). The one exception to this is Annex E of ITU-T Recommendation H.263 which uses a technique called arithmetic coding to generate compressed bits in an algorithmic fashion from a mathematical specification. Here, there is an extra process at the end of the entropy encoder which inspects the bits that are generated and, on the encoder side, if there are too many 0's in a row, a “marker” bit (a 1-bit) is inserted before a pre-determined number of 0's are encountered. On the decoder side, the decoder counts up the zeroes and if it encounters the critical number of zeroes, it knows that it has encountered a real start code. If the decoder sees one less zero than the critical number, it knows that the following 1 bit is a marker bit inserted to avoid start code emulation, discards that bit, and takes the following bits as the continuation of the real data.
The problem with this solution is that it makes the encoder and the decoder inspect and process the incoming data at the bit level. Analyzing and shifting the location of the data that is being processed by single bit positions becomes difficult and can undesirably tax the decoder. Bit-wise shifting is also a processor-intensive operation.
Accordingly, this invention arose out of concerns associated with providing improved methods and system for preventing start code emulation.