1. Technical Field
The invention relates to a multi-byte search, for example to locate a pattern in streaming data. More particularly, the invention relates to a high-speed mechanism for scanning a real-time stream of data to locate a start-code-prefix in an MPEG-2 data stream, and streams that use the exact same start-code paradigm. The same general algorithm can be applied to search patterns of any byte length, and is especially useful in cases where the length of the search pattern is an odd number of bytes.
2. Description of the Prior Art
The MPEG-2 standard is often used for the formatting and transmission of audio and video information (for more information on the MPEG-2 standard, see http://www.mpeg.org). Due to the widespread use of the MPEG-2 standard, it has been desirable to implement a high-speed mechanism for scanning a real-time (i.e. no flow control) stream of MPEG-2 data to locate start code prefixes (i.e. byte patterns that delimit the structural components or packets which constitute the data stream). This is necessary because the data in such stream are passed to a system for processing in real time. Accordingly, each elementary unit of the stream must be reliably identified in real time for such unit to be parsed into its constituent parts and processed at a rate commensurate with the incoming data rate. Such start code prefix scanning typically represents a very significant portion of the overall CPU use in MPEG-2 processing programs, in some cases exceeding 50% of CPU use.
In the MPEG-2 standard (see ISO/IEC 13818-1:1996(E) Table 2-17 and ISO/IEC 13818-1:1996(E), Tables 2-2, 2-6, 2-25, 2-27, and 2-28), a start-code-prefix is represented by the 8-bit-aligned byte pattern 0xc3x9700 0xc3x9700 0xc3x9701, with the following additional constraints:
If the three bytes following the 0xc3x9700 0xc3x9700 0xc3x9701 pattern are all 0xc3x9700, it is not a valid start code and should be so rejected by the scanner.
For an audio packetized elementary stream (PES), the only valid values for the byte following the 0xc3x9700 0xc3x9700 0xc3x9701 pattern lie in the range 0xc3x97C0 through 0xc3x97DF, inclusive. For video, the reverse is true: if the byte following the 0xc3x9700 0xc3x9700 0xc3x9701 pattern in a video stream lies in the range of 0xc3x97C0 through 0xc3x97DF, inclusive, it is not a valid start code, and the scanner must reject it.
Because some present MPEG-2 implementations use MPEG-1 format in their audio PES streams, it is possible for a seemingly valid start code to appear in the payload because it is not illegal for an MPEG-1 audio stream to contain 0xc3x9700 0xc3x9700 0xc3x9701 as part of its compressed payload. The next few bytes following the start code pattern can be examined to validate further the presence of a real start code, but there is no guaranteed algorithm to insure that audio start code synchronization is maintained during scanner operation on an MPEG-1-conforming audio channel. This problem does not exist for conformant MPEG-2 video and audio channels. In fact, it is guaranteed in the MPEG-2 specification that this can never occur.
There are two approaches that have been used in the art to address the issue of MPEG-2 start code scanning:
A serial read of the incoming bytes looking for a 0xc3x9700 0xc3x9700 0xc3x9701 pattern. This approach is processor intensive.
Reading the MPEG-2 data into a buffer, and then examining every third byte to see if it is a 0xc3x9700 or 0xc3x9701. When either value is found, the neighboring bytes are examined to see if they constitute a 0xc3x9700 0xc3x9700 0xc3x9701 pattern. This byte-wide search is also processor intensive, although considerably less so than the first approach.
Neither above cited approach is particularly efficient because a significant amount of processor time is expended not only in looking for a target byte value(s), but in further qualifying the neighboring bytes to determine if a complete start code has been discovered. The first technique cited above can be coded in assembly language for processors having so called string instructions, such as the Intel 8xc3x9786/Pentium family, and achieve a performance boost.
Nonetheless, it would be advantageous to provide an algorithm that is more efficient than either of the two above-cited techniques, even when they are coded in assembly language.
In the preferred embodiment of the invention, a word-wise search is performed. Such strategy requires the same number of clock cycles as a byte-wide search, but involves one-half as many addresses, thereby cutting execution time in half. For every start code, either the first or second byte is on a word boundary, so that by searching the data twice, first for the 0xc3x9700 0xc3x9700 word, then again for the 0xc3x9700 0xc3x9701 word, every start code in the data can be found (each successful search xe2x80x9chitxe2x80x9d requires an examination of the neighboring bytes to insure that it is really a start-code, but the rules are simple and can be coded efficiently). There are normally no wait states in the second search because the second search will be executed out of the machine""s data cache (if it has one).
The invention provides an algorithm that is more efficient at start code scanning than either of the two above-cited techniques, even when they are coded in assembly language. The CPU time made available by the efficiency of this algorithm allows the processor to handle other tasks, thereby allowing a more complex and rich product to be deployed on machines of given processing power. For example, the invention makes more CPU time available to make it easier to implement such features as software picture-within-picture, visually powerful but CPU-intensive Sarnoff video effects, simultaneous view and record, interactive Web access, background Electronic Program Guide (EPG) processing, HDTV video rendering, and software motion compensation.