Existing Internet streaming media protocols transport audio and video data in “raw” form. The audio and video data are “raw” in the sense that the data stream consists primarily of information sufficient for a computing device (e.g., a personal computer) to hear or view the information.
There are several media distribution software packages that are currently available for transmitting and receiving audio and video content across the Internet. These media distribution software packages include a server software that receives audio and video information provided from a media source, such as a database or a live source (e.g., a live feed), converts the audio and video information into data packets that are compliant with Internet protocols, and transmits or broadcasts the data packets across the Internet to end users. Client software (e.g., a media player) is also provided to the end-user for receiving the media stream (e.g., audio and video data packets) and for rendering the audio and video through a speaker and display, respectively. For example, two popular media client software packages are the Windows Media Player available from Microsoft Inc. and the RealPlayer available from RealNetworks Inc.
Unfortunately, the current media streams provide limited facilities to enable the provision of personalized content based on the preferences of the end-user. Furthermore, the current media streams do not have any mechanism for providing precise time synchronization that is needed for applications such as the insertion of local broadcasts or advertising.
Accordingly, it would be desirable for there to be a mechanism that can directly convey program structure and identity with both precision and granularity.
There have been some proposals to develop a mechanism to synchronize processing streams. A first approach utilizes a reference clock to start/stop the recording of a scheduled program. Unfortunately, this approach requires that the programs be precisely scheduled and leaves little or no opportunity for stations to transmit unplanned live content.
A second approach uses a pre-existing agreement about a sequence of numbers or timestamps. However, this approach requires complex protocols to exchange this information. Furthermore, the control protocol can fail. An example of the second approach is described in a publication entitled, “Program Insertion in Real-Time IP Multicasts.” This publication describes a program insertion system architecture for mixing real-time audio and video streams originating from multiple, physically separated sources. The mixing of streams is decentralized and relies on new protocols to coordinate the transfer of session control between IP multicast sources.
Unfortunately, this approach suffers from the following disadvantages. First, the synchronization software is complex, thereby increasing system overhead and costs. Second, the approach operates only in networks that are capable of IP multicasting. Third, this approach may require extensive media packet buffering that may not be available at a particular stream processing point.
A third approach can use the initiation or suspension of packet flow to indicate program initiation or termination. However, this approach is essentially guesswork as to what is about to happen in a program. As with most guesswork, there are cases where the system guesses incorrectly. For example, a silent segment where no packets are sent in order to preserve bandwidth can be incorrectly interpreted as an actionable program change where no action is needed.
In summary, the prior art approaches only offer tolerable results at the expense of injecting complex mechanisms into the system that increase system overhead and costs. Furthermore, these approaches often have difficulty in maintaining tight time synchronization when processing streams. For example, many of these approaches fail to maintain precise time synchronization, thereby resulting in undesirable perceptible artifacts (e.g., visible and audible artifacts).
Consequently, it would be desirable to have a facility for generating, detecting and using program cues without requiring synchronized clocks, IP multicast, complex control protocols, or guesswork about program changes.
Based on the foregoing, there remains a need for a method and system for embedding program timing and identification cues in Internet media streams that indicate events whose timing is significant to receivers and that overcomes the disadvantages set forth previously.