The present invention relates to the field of digital communications systems, and more particularly to systems transporting multiple media (multimedia) and/or communicating such multimedia through a plurality of connections to multiple callers.
In the prior art, multimedia communications, such as videoconferencing systems for providing two way video and audio, are well known. Given sufficient bandwidth and dedicated independent channels, (e.g. 6 Mhz for an analog video channel, 3 Khz for an audio link over a standard analog telephone line, etc), videoconferencing between two callers can be realized. However, communication channels providing 6 Mhz video bandwidth are not generally or universally available. A major obstacle to wide spread implementation and acceptance of multiple media conferencing systems is the limited bandwidth of the available communication channels. In addition, typical communication channels available on packet switched networks such as AppleTalk, from Apple Computer, California, USA, or Netware from Novell Inc, Oregon, USA, do not provide the continuous real time analog or digital connection of a telephone line or modem. Instead, packet switched networks provide non-real time bursts of data in the form of a switched packet containing a burst of digital data. Thus, in addition to bandwidth limitations, packet switched networks present delay limitations in implementing real time multiple media conferencing systems. The same bandwidth and time delay limitations which apply to all time division multiple access (TDMA) communication systems and similar schemes present obstacles to achieving real time multimedia communications.
Typically, the problem of videoconferencing two callers is approached by compressing the composite video signal so that the resulting transmitted data rate is compatible with the available communication channel, while permitting acceptable video and audio to be received at the other end of the communication channel. However, solutions in the past using lossy compression techniques, have been limited to compromising quality in order to obtain acceptable speed. Recently, non-lossy compression techniques have become available. The problem still remains as to how to match the bandwidth and timing constraints of available digital formats to the available communication channels, both present and future.
The present invention is embodied in a digital communication system where multiple media data sources are time multiplexed into a packetized data stream. At both the transmit side, and the receive side, audio packets are given priority processing over video pickets, which in turn have priority over text/graphics data packets. Continuous real time audio playback is maintained at the receiver by delaying the playback of received audio in a first in/first out (FIFO) buffer providing a delay at least equal to the predicted average packet delay for the communication system. Optionally, the average system delay is continuously monitored, and the audio and video playback delay time as well as audio and video qualities are adjusted accordingly. In another embodiment of the invention, a conference of three or more callers is created by broadcasting a common packetized data stream to all conference callers. Use of the present invention further permits an all software implementation of a multimedia system.
1. In accordance with a first aspect of the present invention, multiple data sources forming data packets are combined into a prioritized data stream.
The present invention is embodied in a method and apparatus for combining data from a plurality of media sources into a composite data stream capable of supporting simultaneous transmission including multiple video and graphic signals and real time audio. Video, audio and other signals are integrated in a non-standard transmission format determined by a novel streaming algorithm and prioritization scheme designed to provide the best balance between transmission quality and realization of real time rendition of each.
For example, each data type packet at the transmitter is assigned a priority between 0 and 10000, with 0 being the highest priority and 10000 the lowest. An audio packet is given priority 20, a video packet is given priority 50. Screen data packets and file data transfer packets are both given priority 180.
Before transmission on the communication channel, packets are placed in a queue according to priority order. As new packets are generated, the queue is reorganized so that the new packet is placed into its proper priority order.
At the receiver, each task runs according to its assigned priority. Packets with priorities between 0 and 100 are processed first, to the exclusion of packets with priorities 101 through 10000. Audio, being the highest priority (20), is processed first to the exclusion of all other packets. Within the class of packets with priorities between 101 and 10000, packets are processed according to relative priority. That is, higher priority tasks do not completely shut out tasks of lower priority. The relationship among priorities is that a priority 200 task runs half as often as a priority 100 task. Conversely, a priority 100 task runs twice as often as priority 200 task. Tasks with priorities between 0 and 100 always run until completion. Thus, video, screen data and file data, processing tasks are completed after audio processing in accordance with the relative priority of the packets.
A multi-tasking executive dynamically reassigns task priorities, to efficiently complete all tasks within the available time, while performing the highest priority tasks first. At any given time, there are different tasks all at different priorities, all yielding to each other. In general, a task yields to a higher priority task, if it is not running an uninterruptable sequence. If the current task completes its cycle, its priority is reassigned to a lower priority. If the priority of two or more tasks is equal, then the multi-tasking executive executes each task in a round robin fashion, performing a portion of each task, until the completion of all tasks with the same priority.
The assignment of packet priorities, and processing according to priority assures that audio will be given precedent over video, while audio and video will be given precedent over both screen data and file transfer data.
As indicated above, continuous real time audio playback is maintained at the receiver by delaying the playback of received audio in a first in/first out (FIFO) buffer having a size at least equal to the predicted average packet delay for the communication system. Optionally, the delay of the audio FIFO may be made variable. A variable delay audio FIFO buffer at the receiver allows the system to shrink or grow the time delay between one machine and the other. The ability to shrink or grow the difference in the between the sender and receiver permits the system of the present invention to compensate for indeterminate system delays. If the changes are slight, the difference in pitch is not noticeable. For greater changes, the technique of audio resampling may be used to increase or decrease the rate of audio playback without changing the pitch of audio content.
Similarly, video playback continuity at the receiver may also be improved by delaying the playback of received video in a first in/first out (FIFO) buffer having a size at least equal to the predicted average packet delay for the communication system. The delay of the video FIFO may be made variable, allowing the system to shrink or grow the time delay between one machine and the other to compensate for indeterminate system delays. Again, if the changes are slight, the change in frame rate is not noticeable. However, video data does not age as quickly as audio data. Therefore a smaller video FIFO can be used. Also, a video image may have short discontinuities without a perceived loss of the video connection. Audio playback, on the other hand, is more sensitive to discontinuities, and it is more important to maintain continuity at the receiver. Ideally, when both audio and video are used in a multimedia conference, the delay for audio and video should be equal to make sure that they are synchronized. In the latter case, the actual system delay is calculated by finding the maximum delay of both audio and video packets.
Data from media sources tend to come in bursts. For example, audio data rates rise when speaking, and fall to zero during a silence. In the present embodiment, the silence between words provides the present system with an opportunity to catch up by refilling the audio FIFO buffer before it empties. In such manner, the present system compensates for the delay inherent in a packet switched, time delay variant, communication channel.
Similarly, video sources including graphic screen data, are generated in bursts. That is, the data rate for video ideally falls to zero when there is no motion. The data rate for transmitting screen graphics falls to zero when are no changes. When the caller changes the screen, (such as the collaborative work document displayed on the screen), data is generated.
Thus, following the priority scheme of the present invention, video is updated only when no speech data is being processed. However, processing of speech data does not included the playing of sound. Once the sound starts playing, there is no need to further spend time to process the sound. Sound playing needs no supervision. Therefore, video updating occurs while sound is playing. After speech is playing close to real time (with a delay), video text and graphics are updated in the background. Video, text, graphics and data files are updated at lesser priorities. Except for audio and video data, task priorities are re-assigned to assure that all tasks will be completed, and that a higher priority task will not completely prevent the lower priority tasks from being completed.
2. In accordance with a second aspect of the present invention, multiple signal packets are broadcast to a plurality of callers to create a common multimedia conference.
In addition to assigned priorities, data packets having multiple destination addresses are broadcast over a plurality of connections to multiple callers. Each caller receives the same data packets with assigned priorities, and processes the received packets in a similar manner. As new data is generated from each caller in the video conference, new data packets are broadcast to the other callers. Thus, due to the broadcast of data packets representing audio, video and screen data, all callers are conferenced together, each seeing and hearing each other, while discussing the same screen document. Additional callers can be added to the conference over a plurality of connections without adding undue burden, because in a conference, each caller needs to generate data only once, which is then transmitted either simultaneously or sequentially depending on the kind of connection, to other callers.
3. In accordance with a third aspect of the present invention data received on a first communication medium (for example on a broadband local area network, such as ethernet) are re-broadcast on a different communication medium (such as a telephone line) in order to conference callers on the different communication media in a common multimedia conference. The present invention thereby provides the option of desktop videoconferencing on standard computer networks and telephone lines.