FIG. 1 illustrates a conventional video system 100 including a recorder 102 and a player 104. Recorder 102 is able to record video and audio data and to transmit data 106 that is based on the recorded video and audio data to player 104. In some cases, transmission of data 106 may be via hard wire transmission, for example when recorder 102 and player 104 are in the same device such a video camera with a playback feature. In other cases, transmission of data 106 may be via wireless transmission, for example over cell phones. In other cases, transmission of data 106 may be via delivery of a recording medium, such as a disk or flash memory. However the delivery, player 104 is then able to playback video and audio data corresponding to the recorded video and audio data based on data 106.
As illustrated in the figure, recorder 102 includes a controller 108, a detector 110 and an encoder 112, whereas player 104 includes a controller 114, a decoder 116 and a display 118.
In operation of recorder 102, controller 108 instructs detector 110, via instruction line 120, to record images and audio data. In response, detector 110 records a first image 122 at time t1 and serially outputs data corresponding thereto and then records a second image 124 at a time t2 and serially outputs data corresponding thereto. A conventional detector of this type may include for example a charge coupled device. Detector 102 can further detect sound, for example via a microphone, and serially outputs audio data corresponding thereto. A bitstream of data 126 corresponding to first image 122 detected at time t1, to second image 124 at a time t2, and the audio data is then received by encoder 112. Controller 108 instructs encoder 112, via instruction line 128, to encode bitstream of data 126 to create a compressed bitstream 106 as discussed in more detail below.
Typically, the amount of data that any particular system may transmit and deliver is limited by physical parameters of the components of the system. Further, image data is very large compared to audio data. Therefore, to transmit or receive video data in its entirety may strain, or go beyond, the limits of a particular system. To avoid this situation, conventional techniques have been developed to compress video, and even audio, data.
One specific conventional video/audio compression technique follows the standard set and maintained by the Moving Picture Experts Group (MPEG). This compression technique supported by the MPEG standard is able to transform video data corresponding to a plurality of consecutively recorded individual images, each image of which comprises a large amount of image data, into a Group of Pictures (GOP). The compression technique supported by the MPEG standard is further operable to interleave audio data within video data.
FIG. 2 illustrates an exemplary bitstream 200 encoded with a compression technique supported by the MPEG standard. Bitstream 200 includes a plurality of GOPs 202 in addition to audio packets 204.
Each GOP 202, according to the MPEG standard, has a specific structure, which will be described with reference to FIG. 3.
FIG. 3 illustrates an exemplary GOP 300, which includes a Video Object Layer (VOL) header portion 302 and a plurality of Video Object Plane (VOPs) portions or Frames. For simplicity of discussion, in this example, GOP 300 includes a first VOP portion (or Frame) 304 and a second VOP portion (or Frame) 306.
VOL header portion 302 is a sequence level header associated with all VOP portions within GOP 300, which in this case are first VOP portion 304 and second VOP portion 306. VOL header portion 302 includes a Time-Increment-Resolution Code (TIRC) portion 316 and a user data portion 318. TIRC portion 316 comprises a 16-bit unsigned integer that represents the resolution of video time stamps for playback. A video time stamp is the time that is associated with a video frame in the encoded video bit stream that indicates the relative time of occurrence of that video frame with respect to the start of the recording. Specifically, TIRC portion 316 includes data corresponding to the time resolution of the video data or the number of units or “ticks” per second. User data portion 318 has additional information that can be used in the reassembly of the compressed digital video.
First VOP portion 304 includes a first VOP header portion 308 and a first VOP data portion 310. First VOP header portion 308 includes a first Time-Increment Code (TIC) portion 320 and a first user data portion 322. The TIC portion consists of a modulo time base and a time increment. The modulo time base is an integral second counter that represents the integral seconds elapsed since the last integral second in a previous frame modulo time base. The time increment is the difference between the current frame and the last integral second. It is represented as number of ticks, as defined in TIRC. In the case of first TIC portion 320, there is no previous frame, so both modulo time base and time increment will be zero.
Second VOP portion 306 includes a second VOP header portion 312 and a second VOP data portion 314. Second VOP header portion 312 includes a second TIC portion 324 and a second user data portion 326. The second TIC portion consists of a modulo time base, which indicates the integral seconds elapsed since the beginning of the sequence, and a time increment, which indicates the time difference between the second VOP and the last integral second.
As an example, if controller 108 instructs encoder 112 to encode the video data 126 at a frame rate of 30 frames per second (fps), and if controller 108 instructs encoder 112 to set first TIRC portion 320 to 300, the TIC portions will be calculated as follows, assuming no frame skips:                Time for Frame 1=0 sec;        Modulo time base frame 1=integral seconds elapsed since beginning of sequence=0 sec;        Vop_time_increment frame 1=0;        Time for Frame 2=333 msec;        Modulo time base Frame 2=integral seconds elapsed since beginning of sequence=0 sec; and        Time increment Frame 2=10.        
Returning back to FIG. 1, encoder 112 encodes bitstream of data 126 with a compression technique supported by the MPEG standard to create a compressed bitstream 106.
In operation of player 104, decoder 116 receives compressed bitstream 106. Controller 114 instructs, via instruction line 130, decoder 116 to decompress the data in accordance with the MPEG standard to generate data stream 132 corresponding to first image 122 detected at time t1, to second image 124 at a time t2, and the audio data. Controller 114 further enables playback, via instruction line 134, of the video data at various playback speeds, as discussed in more detail below. Display 118 then plays back first image 136, second image 138 and the audio data.
Some conventional recorders, such as video cameras, camera phones and digital cameras, may record at various frame rates, selectable by the recorder operator. In fact some offer the capability to record high frame rate video, i.e., frames rates higher than 30 fps, with a bitstream format having an apparent record frame rate that is a normal frame rate video, e.g., 30 fps. The apparent frame rate must be no greater than 30 fps for the bitstream to be compliant. The video time stamps can be scaled in the recorded video by a slow-motion factor that makes the video “look like” it was recorded at a different rate. When played back, the result is a slow-motion playback, i.e., a video playback speed that is slower than the original live view, which has a slow-motion factor associated therewith. Slow-motion factor is the factor by which the video is played back at slower than the original live view in a conventional video player, i.e., a 30 second video recording that is recorded at twice the playback frame rate then played back at a constant frame rate over 60 seconds, has a slow-motion factor of 2×.
For example, if the video is recorded at 120 fps and played back at 30 fps, the video appears to have a 4× speed reduction during playback because it takes 4 seconds to playback every 1 second that was recorded (120 frames). This has an advantage over simply playing back 30 fps video at 7.5 fps. Specifically, by playing back 30 fps video at 7.5 fps, frames must be duplicated to achieve a 30 fps display rate, which results in a jerkiness in the video. However, playing back high frame video at 30 fps results in every frame in the 30 fps playback being unique, resulting in much smoother motion.
True-speed playback is the video playback speed that looks like the original live view, i.e., a recording of a clock second hand would show one elapsed second for each second of playback. The problem in existing implementations is that a true-speed playback cannot easily be achieved because the original recording frame rate is not known. Since the bit stream format “looks” like 30 fps (for example), one would need to “know” that the video was recorded at 120 fps then manually configure the video player to only display 1 out of every 4 frames at 30 fps. The video could have been recorded instead at 90 fps in which case the video player must display 1 out of every 3 frames at 30 fps.
What is needed is a system and method to playback the true recording speed of a recorded video with audio, that remains compatible with existing playback systems.