1. Field of the Invention
The present invention relates generally to systems and methods for transmitting voice data over computer networks such as the Internet and the World Wide Web (xe2x80x9cWebxe2x80x9d), and more particularly to systems and methods for enabling voice transmission over high level Internet protocols, such as those used by web browsers and web servers.
2. Related Art
This application hereby incorporates, in its entirety, U.S. Pat. No. 5,944,791 (xe2x80x9cthe ""791 patentxe2x80x9d), issued on Aug. 31, 1999 to Andrew W. Scherpbier. The ""791 patent provides a system and method for allowing a first computer, or xe2x80x9cpilot computerxe2x80x9d,to direct the web browsing experience of one or more second computers, or xe2x80x9cpassenger computersxe2x80x9d.The pilot computer controls what web pages are displayed on the passenger computers. The system and method performs this function without requiring modifications to the web browsers of the passenger computers. In addition, the ""791 patent discloses how to allow multiple pilot computers to simultaneously direct the same web browsing session. Thus the ""791 patent creates a collaborative web browsing session.
Such collaborative web browsing sessions can be highly useful on computer networks. They allow a company, government agency, or even an individual, to conduct a conference for a widely dispersed audience. Such conferences can be a simple slide presentation similar to a Powerpoint presentation, or they can be more detailed, and include the vast versatility of the Web.
However, in addition to the visual presentation, it is desirable to include an audio presentation as well. This is true not only for collaborative web browsing sessions, but also for any other network conferencing system requiring real-time interactivity. To accommodate this need, standard phone lines have been used to fulfill this audio requirement. While these sorts of conference calls can be easily accomplished for small groups, the difficulties of establishing such conference calls for large groups over standard phone lines can be extreme. These difficulties become even greater when some desired presenters or audience members for the conference are located overseas.
In addition to standard phone lines, computer networks, such as the Internet, have also been used to support an audio presentation. The problem is that traditional methods for transmitting audio over computer networks introduce unpredictable delays in the audio broadcast. Such delays are unacceptable for collaborative web browsing sessions and network conferencing systems in general, which must be coordinated and highly interactive. Thus, traditional methods of providing audio cannot be used with collaborative web browsing systems, or any other systems requiring real-time interactivity.
Examples of traditional network based methods of providing conference audio include Microsoft""s Net Meeting, Netscape""s Conference, Internet Conference Professional, and CU-See Me. These types of traditional Internet audio broadcasting methods typically add ten to sixty seconds of latency to an audio stream so that any network problems can be smoothed out.
In addition to these latency problems, these systems are undesirable because their audio signals can be blocked by firewalls, proxy servers and the like. Firewalls and proxy servers may block traditional Internet audio broadcasts because they typically use UDP/IP (User Datagram Protocol/Internet Protocol) to send audio data. Because UDP has no control of multiple related packets, it is difficult to proxy UDP streams and firewalls tend to block them.
In an attempt to overcome these problems, traditional Internet broadcasting software is typically designed for specific types of computer architectures and is then installed on each customer""s computer system. These machine-specific software products will typically use UDP/IP to send voice data. UDP provides faster transmission times over the Internet at the expense of the delivery guarantees of Transmission Control Protocol (TCP). The locally installed software solves the delivery problems by buffering, for example, ten to twenty seconds worth of data, thus allowing the client time to reorder mixed up packets, request re-transmission of lost packets or ignore duplicate packets. In addition, as already mentioned, because UDP has no control of multiple related packets, it is difficult to proxy UDP streams and firewalls tend to block them. Thus, providers of traditional audio broadcasting systems also sell system-specific plug-ins to the firewalls and proxy servers to solve the transmission restrictions common to computer systems of the highly sought Fortune 1000 customers.
This system has at least three major drawbacks. First, it requires the installation of machine specific software to overcome restrictions imposed by firewalls and proxy servers, and the use of UDP. This is something that many companies would prefer not to do. Most customers of Internet conferencing services would prefer to keep their company firewalls and proxy servers intact, and avoid unnecessary modifications.
Second, because this system relies on locally installed software, it requires that the audio broadcasting software be able to fall back to the slower Hyper Text Transfer Protocol (HTTP)/TCP/IP to allow complete access to the conference over the Internet. This network protocol adds significant overhead to transmission times. With the traditional streaming audio signal, any network congestion will create cumulative delays which are significant. This limits the ability of the Internet conference to be interactive, which is a fundamental requirement of Internet conferencing.
Finally, even when the software is only using UDP/IP, there is still a major delay in voice transmission, typically at least ten seconds. This delay becomes worse with network congestion. Because presentations using Internet conferencing are interactive, excessive delay, or latency, from when a presenter says a word to when an audience member actually hears it, is unacceptable to most consumers of Internet conferencing services.
Therefore, what is needed is a system and method for providing voice data transmission over computer networks, such as the Internet, which minimizes transmission delays, bypasses firewalls and proxy servers, and avoids the installation of machine specific software.
The present invention is directed toward a system and method for transmitting voice data over high level networking protocols, such as HTTP/TCP/IP.
A feature of the present invention is that it uses HTTP as its primary protocol to transmit voice data over the Internet. In this fashion, it cuts through firewalls and proxy servers used by many potential consumers of Internet conferencing services. It does this seamlessly, without installation of system-specific software. Preferably, the only requirement is a standard Java-enabled web browser and the temporary installation of a small Java client, which is done automatically by the web browser without user intervention. Additionally, because the present invention relies on TCP instead of UDP to transmit voice data, it has automatic guaranteed delivery of packets. This eliminates the need for a large buffer on the client computer to store incoming voice data, and thereby removes a source of fixed latency found in conventional systems.
Another feature of the present invention is that it utilizes variable compression based on silence detection. This silence detection is performed at a fine scale. By taking advantage of the natural silences and pauses in human speech, the present invention minimizes the amount of voice data that must be transmitted over the network. In so doing, it more than compensates for the transmission overhead added by using HTTP/TCP/IP, and thus significantly reduces delays in transmission. Therefore, this feature of the present invention enables a truly interactive Internet conference.
In addition, according to one aspect of the present invention, the non-silence portions of voice data are bookended with small silent frames. This is done to insure that the threshold detection mechanism employed during silence detection does not cut off the small beginning and ending sounds of each segment of non-silence. In this fashion, the voice data that is transmitted is not improperly truncated. Without this aspect of the invention, the voice of the speaker may sound unnatural.
Another feature of the present invention is that it transmits voice data to each client computer independently, with a data structure that forces each client computer to stay current with the conference. This avoids the cumulative delays that can be caused by network traffic. It does this by not transmitting voice data that has become too old and irrelevant. If a particular client computer experiences local network problems, it will not affect the data received by other client computers, nor will it force the delayed client computer to receive stale voice data. Thus, this feature of the present invention insures that each client stays current with the conference.