Video conferencing is a technique utilized in order to provide both video and audio information from one or more users to a plurality of other users. Typically, a conference bridge is utilized to connect several participants of the video conference, and the signal received at the conference bridge from each conferee is broadcast to the other conference members. As a conferee uses the conference station, he/she views separate images from each of the other conference stations. FIG. 2 shows an example of a conference station as viewed by a conferee participating in a conference with four other conferees. As seen in FIG. 2, the video information from each of the four other conferees is displayed on a conference station video monitor, usually a personal computer. In this example, conferee 2 is missing, since it is the conference station of conferee 2 being viewed. Of course, a conferee may choose to see his own image on the screen.
Recently, much of the available conferencing technology is becoming focused on digital techniques. More specifically, with the availability of Internet access becoming less expensive and more widespread, it has become possible to implement the video conferences over the Internet or other similar data networks. Implementation of such conferences in the digital domain provides improved clarity, availability of compression techniques, etc. Additionally, with the price of personal computers getting lower and the speed of such computers increasing, it is possible to very inexpensively implement functions such as speech recognition, image processing, etc. Little advantage has been taken of the additional capabilities available in PC-based conference stations, and more particularly, of the ability of such conference stations to provide advanced signal processing functions.
There has been little research to date focused upon taking advantage of the additional capabilities of implementing video conferencing in the digital domain. Specifically, effective techniques which may reduce the confusion as to which participants in a video conference are speaking are not found in the prior art. In addition, the prior art does not utilize the combination of video and audio information for the purpose of voice activity detection.