1. Field of the Invention
The present invention relates generally to digital image playback, and more particularly to techniques for synchronizing playback of two or more digital streams based on renderable content of those streams.
2. Description of the Related Art
Personal video recorders (PVRs) are video recording devices that may be used in conjunction with virtually every television broadcast system (e.g. cable, digital cable, satellite, antenna, or combinations thereof), as well as to record from VHS, DVD, Internet sources, etc. PVRs may be programmed to automatically find and record a user's favorite television program or programs so that the user may watch what he or she wants, when he or she wants. Typically, PVRs contain a substantial amount of memory and are capable of recording more than thirty hours of programming. The recorded programming may then be retrieved and viewed by the user as desired.
Current PVR technology allows users to time shift the content of the program being recorded (typically television broadcasts). In other words, a user may record a television broadcast and watch it several hours, or even days, later. Alternatively, a user may decide to record a program and begin watching it a predetermined amount of time after the start of the program so that the user has the ability to skip any commercials that may be dispersed throughout the program. Thus, the user would be watching the program during virtually the same time period as people who are watching the live broadcast. However, given the capability to skip through the commercials, the user of the PVR will watch the same program as someone else in less time.
Notwithstanding the above benefits of PVRs, there are disadvantages associated with their use. For example, many people enjoy watching various programs simultaneously (e.g., sporting events, talk shows or dramas) even though the people may be physically located in different locations. These people will often communicate with each other during the program by other communication means such as the telephone or Internet. Therefore, they are able to discuss the program as the events materialize. However, as people time shift content, they lose the ability to simultaneously watch shows “together” while at their respective locations. Inevitably, the two users will be watching the same program out of synch and therefore one user will know the results of a dramatic scene or sporting event, for example, prior to the other user.
Copending U.S. patent application Ser. No. 09/894,060, entitled “Synchronized Personal Video Recorders”, filed Jun. 28, 2001, assigned to the assignee of the instant application, incorporated herein by reference, and not admitted to be prior art by its mention in the background section, discloses a system in which one PVR synchronizes with another PVR by sending out a status message to the other PVR. The message issues when the user of the initiator PVR operates a PVR function such as start up, fast forward or rewind, to allow the recipient of the message to perform the counterpart function to keep the presentation on both PVRs synchronized. The message is also transmitted periodically, to update the synchronization. Within the message is an identifier of the program being watched or to be watched, an indicator of the mode of watching (e.g. normal play, fast forward, pause, etc.), and the time or frame into the program. The time or frame allows the recipient PVR to synchronize its replay with that of the sending PVR, by comparing the time or frame in the message with its own the current time or frame.
In expanding on this concept of synchronizing a sending PVR with a recipient PVR by transmitting a time or frame from the sending PVR to the recipient PVR, it will be initially assumed, for purposes of illustrating the present invention, that both PVRs are playing back respective, identical copies of a video. The frame of the sending PVR is part of the sender's copy of the video, which resides in a bit stream that is stored in a storage medium. Similarly, frames of recipient PVR's copy of the video reside in a bit stream that is stored in the recipient's storage medium.
It will also be initially assumed that when the video timer of one PVR shows as its output the same time as does the other PVR's video timer, that the respective videos playing are at the same point content-wise in their respective playbacks. When any PVR fast forwards or rewinds, this correspondingly and synchronously advances or rolls back the time count of its respective video timer.
If, for example, the destination PVR's video timer reads 1 hour, 1 minute and 1 second at a time when the destination PVR receives from the sending PVR a message having as its output time stamp 1 hour, 1 minute and 2 seconds (set according to the sending PVR's video timer), this might indicate the destination PVR's playback is one second behind that of the sending PVR. It might be the case, for example, that, according to the timing of a single reference clock, the destination PVR started its playback one second after the sending PVR started its playback. Based on that premise, the destination PVR can take corrective action to compensate for the one second time difference. Specifically, if the transmission time of the message was negligible, e.g. one millisecond while the time difference is one second, the full one second time difference can be relied on to take corrective action to synchronize the respective playbacks on the PVRs. The destination PVR would, for example, “fast forward” its local copy of the program by a full second and increment its video timer by a second. By this action, the destination's playback would catch up content-wise with that of the sender, and the respective video timers of the sender and destination PVRs would become synchronized.
If, however, the transmission time was not negligible, it needs to be taken into account in comparing the output time stamp of the incoming message with the time the message is received at the recipient PVR so that the corrective compensation applied appropriately reflects the extent to which the respective video timers out-of-synch and, correspondingly, the extent to which the respective playbacks are content-wise out-of-synch.
However, the above technique alone will not always synchronize the video presentation, i.e. make concurrent the playback of corresponding frames in the respective playbacks. The assumption made above, that the content being shown on either PVR at any arbitrarily-selected, common video timer time, is identical, does not strictly hold. The programs recorded on the two PVRs may, for example, begin at the same nominal video start time, but differ slightly, perhaps a second or so, as to the actual point in the video at which they respectively start. As a result, if both playbacks were to be viewed side-by-side, one would lag the other. Thus, even if the video timers of both respective PVRs were perfectly synchronous, the respective showings of the video might be out-of-synch.
Also, even if the two playbacks were to be in synch initially, the presentation may drift out-of-synch as it progresses. For example, the speed at which the respective PVRs play back their respective copies of the video may differ. These differences become more significant if the two PVRs have different actual speeds in the fast forward or rewind mode, and may cause the viewings to fall out-of-sync after one of the PVRs fast forwards or rewinds, commanding the other to follow concurrently and synchronously.
Lack of synchronization may also occur from time to time, due, for example, to different commercials, and thus different commercial time periods, in the two playbacks. Both viewers, for instance, may be watching the same network, e.g. National Broadcasting Company (NBC), but through different cable or satellite providers, e.g. RCN or Time Warner.
If, on the other hand, it is the current frame, rather than the current time, that is conveyed in the message, non-negligible transmission time still needs to be taken into account to synchronize presentation. If, for example, the source PVR sends the destination PVR a message that indicates that frame number “n” is currently playing on the source PVR, the destination PVR needs to know the transmission time, if non-negligible, in comparing the frame number that it is playing at the time of receipt of the message to the frame number indicated in the message.
In the frame-based technique, even if the playbacks are in synch or transmission time is accounted for to bring the playbacks in synch, the PVRs may use different service providers that employ different compression schemes. One scheme might afford higher image quality than the other by including more frames; thus, the assumption above that the sender's copy of the video is the same as the recipient's copy cannot be strictly relied upon. In addition, adaptive techniques are often used to vary the number of frames capturing a moving image based on the amount of movement in the image, time instant to time instant. The difference in the frames numbers for corresponding video content makes synchronization based on frame numbers problematic.
For many situations, these synchronization errors are of such small magnitude that the viewers of respective playbacks do not notice them.
Yet, there exist viewing configurations in which “out-of-synch” effects are significant and interfere with viewing enjoyment. Moreover, in some scenarios where, for instance, people at mutually remote locations are jointly executing a task simultaneously, e.g., using a manual pre-recorded in video form to repair a large online system, precise synchronization of a telephone message, the presentation and action based on the message and/or the presentation may be necessary.
To achieve precise synchronization, the present invention compares corresponding content or “landmarks” of pairs of video playbacks to be synchronized, determines video replay “distance” between the landmark pairs, and slows down or speeds up selected playbacks in accordance with these distances.
U.S. Pat. No. 5,870,754 to Dimitrova et al. (“Dimitrova”), entitled “Video Retrieval of MPEG Compressed Sequences Using DC and Motion Signatures”, and incorporated herein by reference, compares “DC+M signatures” of a query video clip to DC+M signatures in a database to retrieve a video sequence whose content is similar to that of the query video clip, where a video clip is defined as a sequence of video frames.
In one Dimitrova embodiment, DC coefficient information from an I frame and motion vector information from the following frame are combined to form a digital signature, hence the term “DC+M signature”.
An “I frame”, under the MPEG (Moving Picture Experts Group) compression standard, is an intraframe coded frame, which is a coding of a single snapshot of an moving image. Interspersed between I frames are interframe coded frames comprised of information that represents merely a difference between the current state of the moving image and a reference state of the moving image as it existed at a previous moment.
The signature embodies characteristics of the frames it represents, but uses less data. Signatures of respective I frames in a query video clip are compared to respective I frame signatures in a database video clip. The total Hamming distance between the signatures of the query clip and the signatures of a database clip is calculated. (The total Hamming distance is the sum of the Hamming distances between respective signatures of query/database frame pairs of a current query clip and database clip, where the Hamming distance between two signatures is based on a bit-by-bit comparison between the signatures, as explained in Dimitrova).
Specifically, the methodology shifts iteratively along the database to define a current database video clip, simultaneously adding, to the clip, database frames (I frames) and dropping database frames (I frames), with each iteration. The total Hamming distance is recalculated at each iteration, and the minimum Hamming distance over all iterations identifies the database video clip that most resembles the query video clip.
An advantage to using Dimitrova signatures is that they can be derived without the overhead of fully decompressing the image bit stream. Thus, the Huffman or arithmetic coding can be decoded to leave the bit stream in partially decoded form so that, without further decompression, frequency components such as DC coefficients of an image transform such as the discrete cosine transform (DCT) can be utilized in forming the signature, as explained more fully in Dimitrova.
In the Dimitrova embodiment described above, I frames are used as the “representative frames”, i.e. frames for which signatures are derived, if key frames have not been identified in the video sequence previous to Dimitrova's processing. Key frames are frames at shot boundaries, where a shot is a video sequence of a scene. Typically, there are a thousand or more shots in a movie. In another embodiment, Dimitrova uses all frames as representative frames.
The present invention has a goal similar to that of Dimitrova, to compare characteristics of two video streams, except that the present invention uses the comparison to synchronize presentation of renderable content of the streams, whereas Dimitrova merely seeks a video clip similar to the query video clip. To adapt Dimitrova matching for the present invention, query frames are compared not against database frames, as in Dimitrova, but against frames in the participant's copy of the video, so that presentation of the video by the initiator and by the participant can, as a result, be made synchronous. Also, for the sake of processing speed, preferably a single query signature, corresponding primarily to a single frame, is transmitted to the participant for comparison, rather than transmitting all the signatures of a Dimitrova query video clip, which correspond primarily each to a separate frame. Accordingly, since a single query signature is compared, in each iteration, to a single candidate participant frame, the Hamming distance between the signatures of that pair of frames is calculated. The overhead of a “total Hamming distance” calculation is thereby avoided.