This invention relates to a method of and apparatus for decoding an encoded, time-division-multiplexed signal for display to a user in reverse playback, slow-reverse playback, and frame-by-frame-reverse playback modes of operation. Specific embodiments provide for the decoding of encoded audio and video time-division-multiplexed signals.
Devices for reproducing video signals from a storage device, such as a video cassette recorder (VCR), commonly feature user-controlled reproduction functions. Such functionality includes reverse playback, slow-reverse playback, and frame-by-frame-reverse playback in addition to standard playback, fast-forward, and fast-reverse capabilities. With the development of digital video signal recording technology, it is expected that digital video signal reproduction devices will provide similar playback functionality with improved image quality. However, such functionality, coupled with enhanced image quality, is difficult to achieve due to the inherent operation of prevalent digital video signal encoding schemes. Typical encoding schemes, such as those developed by MPEG (Motion Picture Coding Experts Group), generally operate to highly compress video information to facilitate its transmission over channels of very limited bandwidth.
According to the MPEG system, video and audio data are compressed and recorded on a storage device in a time-division-multiplexed format. FIGS. 1A, 1B, and 1C illustrate an MPEG data format. FIG. 1A shows a unit of multiplexed data comprised of at least one "pack" of information and an end code. Each pack includes a pack header and at least one "packet" of information. In a unit of multiplexed data, the length of each pack may vary.
As depicted in FIG. 1B, a pack header can include a pack start code, a system clock reference (SCR) indication, and an indication of the multiplexing rate (MUX RATE). Each packet is typically comprised of a packet header and a segment of coded packet data. FIG. 1C illustrates a sample packet header comprised of a packet start code prefix, a stream identification code (ID), an indication of the length of the packet or the length of following packets (LENGTH), a decoding time stamp (DTS), and a presentation time stamp (PTS). The stream identification code is utilized to identify the packet, indicate the type of the packet, and/or indicate the particular type of data in the packet. For example, stream identification codes may indicate an audio stream, a video stream, a reserved stream, a reserved data stream, a private stream, a padding stream, or the like.
According to a straightforward MPEG implementation, given a set of video images divided into a series of frames, each frame can be coded as an intraframe-coded picture (I picture), an interframe forward-predictive-coded picture (P picture), or an interframe bidirectionally-predictive-coded picture (B picture). Intraframe coding is achieved by compressing data representing a particular frame solely with respect to the data of that frame. Consequently, an I picture can be fully decoded from the data representing the I picture to produce the original frame of video data.
In contrast, interframe forward-predictive coding of a frame is obtained by determining the differences between the frame and a preceding (base) frame which is to be encoded as an I picture or as a P picture. The frame to be coded is represented by data corresponding to these differences to produce a P picture. To decode the P picture, the base frame (I picture or P picture) with reference to which it was coded must be decoded first. The decoded base frame is then modified according to the data of the P picture to recover the original frame. The advantage of interframe forward-predictive coding is that it usually achieves greater compression efficiency than intraframe coding.
A frame can be bidirectionally-predictive-coded by determining differences between it and a combination of an immediately preceding frame which is to be coded as an I or P picture and an immediately succeeding frame which is to be coded as an I or P picture. The frame to be coded is represented by data corresponding to these differences to produce a B picture. To decode the B picture, the preceding and succeeding frames with reference to which it was coded must be decoded first. A combination of the decoded preceding and succeeding frames is then modified according to the data of the B picture to recover the original frame. The advantage of interframe bidirectionally-predictive coding is that it often achieves greater compression efficiency than interframe forward-predictive coding.
An example of the interrelationships among I pictures, P pictures, and B pictures produced according to the MPEG standard are provided in FIG. 2A. In this example, a group of pictures (Group A) is comprised of 15 pictures produced by encoding 15 frames of image data (not shown). The interrelationships, specifically the pattern of predictive coding, are indicated by arrows in this diagram.
Intraframe-coded picture I.sub.2 is coded with respect to only the data of that frame. Interframe forward-predictive-coded picture P.sub.5 is coded with respect to the data used to produce picture I.sub.2. Picture P.sub.8 is coded with respect to the data used to produce picture P.sub.5. Interframe bidirectionally-predictive-coded pictures B.sub.3 and B.sub.4 are each coded with respect to the data used to produce pictures I.sub.2 and P.sub.5. Similarly, pictures B.sub.6 and B.sub.7 are each coded with respect to the data used to produce pictures P.sub.5 and P.sub.8. In this manner, each of the pictures in Group A is produced. Note also that the data used to produce the last P picture of Group A, P.sub.14, is also used to code pictures B.sub.0 ' and B.sub.1 ' of a succeeding group.
According to the MPEG1 video standard (ISO11172-2) and the MPEG2 video standard (ISO13818-2) the pictures of FIG. 2A are rearranged for decoding left-to-right, as shown in FIG. 2B, for normal (forward) video playback. This rearrangement facilitates the decoding of predictive-coded frames (P pictures and B pictures) only after the intraframe-coded picture (I picture) or interframe forward-predictive-coded picture (P picture) with reference to which they were coded are decoded. For example, picture I.sub.2 must be decoded before picture P.sub.5 can be decoded because the coding of picture P.sub.5 depends upon the uncoded frame of data used to produce picture I.sub.2. As a further example, both pictures I.sub.2 and P.sub.5 must be decoded before pictures B.sub.3 and B.sub.4 can be decoded because the coding of pictures B.sub.3 and B.sub.4 depend upon the uncoded frames of data used to produce to pictures I.sub.2 and P.sub.5. The different grouping of pictures in group B reflects this rearrangement. Further, in FIG. 2B, the pictures indicated at B.sub.12 " and B.sub.13 " are from a group the preceded Group A in FIG. 2A prior to the rearranging for decoding.
FIG. 3 illustrates a series of coded video data as it may be stored on a recording medium. The series is comprised of groups of pictures, Groups #0, 1, . . . , J, wherein each group includes pictures coded in accordance with an MPEG standard, e.g. I pictures, P pictures, and B pictures. As depicted in this example, each group begins with an I picture which is followed by an alternating series of B pictures and P pictures. Each group may also include a group header (not shown). A typical group header is comprised of a group start code (GSC), a time code (TC), a closed group of pictures indication (CG), and a broken link indication (BC).
A simple apparatus proposed for decoding time-division-multiplexed data is illustrated in FIG. 4. The apparatus is comprised of a digital storage device 100, a signal separating unit 21, a video decoder 25, and an audio decoder 26. Device 100 stores data in the general time-division-multiplex format depicted in FIGS. 1A, 1B, and 1C. Signal separating unit 21 accesses and reads the stored data, separates the data into audio and video components, and supplies the components to respective signal decoders. Video decoder 25 and audio decoder 26 decode coded video and coded audio signals, respectively, to produce respective video output signals and audio output signals.
Signal separating unit 21 includes a header separating circuit 22, a switch 23, and a control apparatus 24. Header separating circuit 22 detects pack header and packet header data in the stream of data read from device 100 and supplies the headers to control apparatus 24. The time-division-multiplexed data is supplied to an input of switch 23. One output of switch 23 is coupled to video decoder 25 while the other output is coupled to audio decoder 26.
Control apparatus 24 issues commands controlling the accessing of data in storage device 100 and controlling the operation of switch 23. The control apparatus 24 reads the stream identification code contained in each packet header and controls switch 23 to route the corresponding packet of data to the appropriate decoder. Specifically, when the stream identification code indicates that a packet contains video signals, the packet is routed to video decoder 25. When the stream identification code indicates that a packet contains audio signals, the packet is routed to audio decoder 26. In this manner, time-division-multiplexed data is separated into audio and video components and appropriately decoded.
If the video data stored in storage device 100 is coded and arranged according to an MPEG standard as shown in FIG. 3, then the operations of accessing specific video frames (random access) and searching or scanning through the video frames will be inherently limited by the decoding speed of video decoder 25. To achieve faster frame accessing and image reproduction, it has been proposed that the video decoder skip certain coded pictures during such decoding operations.
Since only I pictures can be decoded independent of other frames of image data, video decoder 25 may decode and output only the stored I pictures to achieve a video search (video scan) function. Alternatively, the signal separating unit 21 may be modified to pass only I pictures to video decoder 25 during a search (scan) operation. Control apparatus 24 controls data storage device 100 to supply the signal separating unit those portions of video data containing I pictures of interest. Typically, in search (scan) mode, the audio decoder 26 is muted.
To randomly access a particular stored video frame for decoding and display, it has been proposed that the two I pictures located immediately adjacent, e.g. one before and one after, the selected frame be decoded. From these two I pictures, and, in certain instances, a number of the intermediate P pictures, the desired frame can be decoded. Of course, where the selected frame has been coded as an I picture, only that picture need be decoded. In an application utilizing a fixed data coding rate and a regular coding pattern, the location of each I picture can be obtained by direct calculation.
However, where the rate of data encoding varies or a varying coding pattern is utilized, the locations of the I pictures cannot be determined with the same direct calculation and instead additional information must be considered. Generally, MPEG systems encode data at a varying rate. Therefore, a system such as that of FIG. 3, in carrying out a random data access or searching through stored data by displaying only the I pictures, would need to examine each stored picture to determine the locations of the I pictures. Such a process is necessarily time consuming.
To minimize the time required to search stored video data encoded at a varying rate, two different data systems have been proposed which associate additional information with the stored data to facilitate the determination of I picture locations.
One such system is illustrated in FIG. 5 and is comprised of a digital storage device 100, a signal separating circuit 64, a video decoder 25, an audio decoder 26, and a main controller 67. In this system, a "table of contents" is stored in device 100 which identifies the location of each I picture of video data stored in device 100. By consulting this table of contents, the main controller determines the locations of I pictures quickly, enabling quick accessing, decoding, and display of such pictures. As a result, searching and random access functions can be achieved with reduced processing time.
Device 100 stores video data in a time-division-multiplex format and stores a table of contents identifying the locations of I pictures included in the stored video data. Signal separating unit 64 accesses and reads the stored data; separates the data into audio, video, and table-of-contents components, supplies the audio and video components to respective signal decoders, and supplies the table-of-contents data to main controller 67. Video decoder 25 and audio decoder 26, in response to control signals from main controller 67, decode coded video signals and coded audio signals, respectively, to produce respectively video output signals and audio output signals.
Main controller 67 supplies access command signals to digital storage device 100 to cause the device to access and supply specified segments of stored data to signal separating circuit 64. In turn, the storage device provides the main controller with position information (data retrieval information), which may be in the form of actual data addresses within the device, regarding the data to be accessed. Also, the controller supplies command signals to each of video decoder 25 and audio decoder 26 to control the decoding operations performed by each. Additionally, controller 67 includes a table-of-contents (TOC) memory 68 for storing table-of-contents data.
Signal separating unit 64 includes a header separating circuit 22, a switch 23, a control apparatus 66, and a table-of-contents (TOC) separator 65. Circuit 22 and switch 23 operate in the same manner as described in connection with FIG. 4. Apparatus 66 is the same as control apparatus 26 with the exception that control apparatus 66 does not control the accessing of stored data. Table-of-contents (TOC) separator 65 detects table-of-contents information supplied with the audio and video data and supplies the table-of-contents information to TOC memory 68.
In response to a search command from a user, main controller 67 issues a command to initiate the supply of stored data from digital storage device 100 to signal separating circuit 64. Table-of-contents data is detected by TOC separator 65 and supplied to TOC memory 68. Utilizing the table-of-contents data to determine the locations of I pictures in the video data, main controller 67 controls video decoder 25 to decode only I picture data and skip other data. Audio decoder 25 is muted. Alternatively, main controller 67 controls digital storage device 100 to access and supply only I picture video data to signal separating circuit 64. By both methods, the location of I picture data is identified relatively quickly and only I picture data is decoded and output for display.
Unfortunately, the storage of table-of-contents data requires significant storage capacity in some video data applications. As a consequence, the storage of the location of every I picture has been determined to be impractical. Proposed systems which store only some of the I picture locations have also been contemplated. Inherently, these systems are unable to conduct precise search operations, resulting in significant search delay. Such delay is undesirable.
According to a second proposed data decoding system for accessing stored I pictures with greater speed, data is stored according to the format illustrated in FIGS. 6 and 7 and is decoded by an apparatus depicted in FIG. 8.
In FIG. 6, a data pack (or sector) is constructed of a pack header, a first video packet, an entry packet, a second video packet, and an audio packet, in that order. Each video packet includes a video packet header and a segment of video data. Each audio packet includes an audio packet header and a segment of audio data. An I picture, the location of which is referred to as an "entry point," is located at the beginning of the video data segment in the second video packet. The entry packet stores information regarding the location of one or more I pictures in that pack, the locations of I pictures in any number of packs, or like information.
FIG. 7 illustrates an entry packet format in which information regarding the locations of six consecutive entry points, three before the packet and three after, are stored in the packet. The entry packet includes a packet header, as described hereinabove, formed of a packet start code prefix, an identification code, and an indication of the length of the packet. The entry packet further includes additional identification information (ID), packet type information, an indication of the current number of data streams, an indication of the current number of video streams, and an indication of the current number of audio streams. At the end of the packet, position information for six entry points is stored.
The decoding apparatus of FIG. 8 is comprised of a digital storage device 100, a signal separating circuit 70, a video decoder 25, and an audio decoder 26. Signal separating circuit 70 includes a header separating circuit 71, a switch 23, a control apparatus 72, and an entry point memory 73.
In response to an access command from control apparatus 72, device 100 supplies stored data to header separating circuit 71. Header separating circuit 71 detects pack header data, packet header data, and entry packet data in the stream of data read from device 100 and supplies such data to control apparatus 72. The time-division-multiplexed data is supplied to an input of switch 23. One output of switch 23 is coupled to video decoder 25 while the other output is coupled to audio decoder 26.
Control apparatus 72 issues commands controlling the accessing of data in storage device 100 and controlling the operation of switch 23. The control apparatus reads the stream identification code contained in each packet header and controls switch 23 to route the corresponding packet of data to the appropriate decoder. Specifically, when the stream identification code indicates that a packet contains video signals, the packet is routed to video decoder 25. When the stream identification code indicates that a packet contains audio signals, the packet is routed to audio decoder 26. In this manner, time-division-multiplexed data is separated into audio and video components and appropriately decoded.
Further, control apparatus 72 receives entry packet data, analyzes the data, and supplies entry point information derived from the entry packet data to entry point memory 73 for storage. Control apparatus 72 also receives data retrieval information from storage device 100. Depending upon the application, data retrieval information might be correlated with entry point information to determine actual locations of the entry points within the storage device. These actual locations may also be stored in memory 73 as entry point information. In this manner, entry point memory 73 is loaded with the locations of I pictures stored in storage device 100.
In a search mode, control apparatus 72 determines the current data retrieval position of storage device 100 from the data retrieval information supplied therefrom. The control apparatus then retrieves from entry point memory 73 information pertinent to the entry point located nearest to but before the current data retrieval position of the storage device. Data storage device 100 is controlled by apparatus 72 to immediately change its data retrieval position to that of the identified entry point. Data is reproduced from that point, e.g. the I picture is reproduced, and supplied to signal separating circuit 70 for processing and, thereafter, display.
For example, if the entry packet of FIG. 6 is simply a marker indicating that the succeeding video packet begins with an entry point, then data retrieval can be started at a point located immediately after the location of the entry packet. If, instead, the entry packet is constructed as in FIG. 7, the entry point information is processed to determine the next data retrieval location. Subsequent entry points are determined either from further information retrievals from entry point memory 73 or from analysis of entry packet information stored at the currently accessed entry point. In this manner, I picture data are rapidly retrieved and reproduced in an efficient search operation.
Although the proposed systems described hereinabove can display I pictures in a rapid manner, none are able to effectively achieve reverse playback, slow-reverse playback, and frame-by-frame-reverse playback modes of operation utilizing B pictures and P pictures, as well as I pictures, so as to produce high resolution search mode images for display.