1. Field of the Invention
The present invention relates generally to videoconferencing systems, and more particularly to microphone arrays used in videoconferencing system.
2. Description of Related Art
Videoconferencing is rapidly becoming a popular choice of communication among corporations and individuals. Increasingly, business transactions, for example, are occurring between participants in widely different geographic locations. Since it is often difficult for all such participants to meet in a single location, many business participants rely on teleconferencing mechanisms such as videoconference systems. Videoconferencing systems are generally preferably to other teleconferencing mechanisms because these systems allow participants to view other participants, observe remote demonstrations, and more easily identify a speaking participant at any given moment. In effect, videoconferencing allows people at two or more locations to interact with each other. More importantly, information and communication is exchanged essentially in real-time.
Referring to FIG. 1A, a conventional videoconferencing system 100 is shown. The videoconferencing system 100 includes a video display 102, speakers 106, a microphone 108, and a videoconference unit 110 further comprising a camera 112. The conventional videoconferencing system 100 may be used with a personal computer or, alternatively, may have the videoconference unit 110 coupled to a large display or projection system located in a large videoconferencing room.
A disadvantage with the conventional videoconferencing system 100 is that the videoconferencing system 100 does not have the ability to focus on an individual who is speaking. The focusing process requires determination of a position of the individual, movement (i.e., panning, tilting, and zooming) of the camera 112 to the proper position of the individual, and adjustment of lenses so that the camera 112 is in focus on the individual. When more than one individual is involved in a videoconference, it may be desirable to focus the camera 112 on each individual as each individual is speaking. This focusing task is often difficult, however, because the position of the individual speaking must be determined and the camera 112 moved to that position relatively quickly and smoothly. Therefore, the videoconference systems 100 are typically left in a stationary position, and thus capture an image of the entire room or what is directly in front of the camera 112. Although there may be some videoconferencing systems 100 with the ability to pan and tilt to focus on individuals, the pan and tilt functions are usually manually controlled.
Further, some conventional videoconferencing systems 100 may have the ability to localize acoustic source. These videoconferencing systems 100 often use a vertical and horizontal microphone array to locate an acoustic source within a room. As shown in FIG. 1B, the videoconference unit 110 includes a plurality of microphones 120 arranged in a horizontal array 122 and a vertical array 124. In order to accurately determine position of the sound source, both the horizontal array 122 and vertical array 124 of microphones 120 must be used. The microphones 120 are typically placed so that distance between the microphones 120 in each array 122 and 124 is precisely known. Further, the horizontal array 122 and vertical array 124 are situated so that a relative angle between the arrays 122 and 124 is precisely known.
Typically, a processor (usually located within the videoconference unit 110) is used to determine acoustic source location. Initially, the microphones 120 detect sound, produce signals representing these sounds, and transmit these signals to the processor. The processor then uses this signal information which may include signal strength, signal time, and position of the microphones 120 to calculate an acoustic source location. Conventional methods used to determine the sound source location, such as cross-correlation techniques, are typically slow, inaccurate, and unreliable. Further, because the information cannot be processed fast enough or accurately enough, camera manipulation is not smooth and focused.
Furthermore, accuracy in determining sound source location increases with an increase in number of microphones 120 used in the horizontal array 122 and vertical array 124. Therefore, it is desirable to have as many microphones as possible positioned in both the horizontal array 122 and vertical array 124. Unfortunately, it is often not feasible or economical to have so many microphones 120.
Referring back to FIG. 1A, the conventional videoconferencing system 100 may have the horizontal and vertical arrays 122 and 124, respectively, mounted to a top section of the videoconference unit 110. Since the relative angle between the horizontal array 122 and the vertical array 124 must be precisely known in order for the camera 112 to track the acoustic source location, the horizontal array 122 and the vertical array 124 must be permanently mounted to the videoconference unit 110. This configuration has the further disadvantage of limiting the number of microphones used because increasing the number of microphones would require making the videoconference unit 110 both taller and wider. Additional a larger videoconference unit 110 structure is more difficult to set up and support on top of the video display 102, and is therefore less appealing to consumers.
Therefore, there is a need for a videoconferencing system which uses horizontal and vertical microphone arrays which may be mounted in various locations. There is a further need for a method of analyzing data from these microphone arrays which is fast and accurate enough to properly manipulate a camera.