1. Technical Field
This invention is directed toward an integrated omni-directional camera and microphone array. More specifically, this invention is directed towards an integrated omni-directional camera and microphone array that can be used for teleconferencing and meeting recording.
2. Background Art
Video conferencing systems have had limited commercial success. This is due to many factors. In particular, there are typically numerous technical deficiencies in these systems. Poor camera viewpoints and insufficient image resolution make it difficult for meeting participants to see the person speaking. This is compounded by inaccurate speaker detection (especially for systems with pan-tilt-zoom cameras) that causes the camera not to be directed at the person speaking. Additionally, poor video compression techniques often result in poor video image quality and “choppy” image display.
The capturing devices of systems used for teleconferencing tend to focus on a few major sources of data that are valuable for videoconferencing and meeting viewing. These include video data, audio data, and electronic documents or presentations shown on a computer monitor. Given that numerous software solutions exist to share documents and presentations, the capture of audio and video data in improved ways is of special interest.
Three different methods exist to capture video data: pan/tilt/zoom (PTZ) cameras, mirror-based omni-directional cameras, and camera arrays. While PTZ cameras are currently the most popular choice, they have two major limitations. First, they can only capture a limited field of view. If they zoom in too closely, the context of the meeting room is lost; if they zoom out too far, people's expressions become invisible. Second, because the controlling motor takes time to move the camera, the camera's response to the meeting (e.g., switching between speakers) is slow. In fact, PTZ cameras cannot move too much or too fast, otherwise people watching the meeting can be quite distracted.
Given these drawbacks and recent technological advances in mirror/prism-based omni-directional vision sensors, researchers have started to rethink the way video is captured and analyzed. For example, BeHere Corporation provides 360° Internet video technology in entertainment, news and sports webcasts. With its interface, remote users can control personalized 360° camera angles independent of other viewers to gain a “be here” experience. While this approach overcomes the two difficulties of limited field of view and slow camera response faced by the PTZ cameras, these types of devices tend to be too expensive to build given today's technology and market demand. In addition, these mirror prism-based omni-directional cameras suffer from low resolution (even with 1 MP sensors) and defocusing problems, which result in inferior video quality.
In another approach, multiple inexpensive cameras or video sensors are assembled to form an omni-directional camera array. For example, one known system employs four National Television System Committee (NTSC) cameras to construct a panoramic view of a meeting room. However, there are disadvantages with this design. First, NTSC cameras provide a relatively low quality video signal. In addition, the four cameras require four video capture boards to digitize the signal before it can be analyzed, transmitted or recorded. The requirement for four video capturing boards increases the cost and complexity of such a system, and makes it more difficult to manufacture and maintain.
Besides the problems noted with video capture, capturing high-quality audio in a meeting room is also challenging. The audio capturing system needs to remove a variety of noises and reverberation. It also must adjust the gain for different levels of input signal. In general, there are three approaches to address these requirements. The simplest approach is to use close-up microphones (e.g., via headset), but this is cumbersome and intrusive to the user/speaker. A second approach is to place a microphone on the meeting room table. This prevents multiple acoustic paths and is currently the most common approach to recording meeting audio. These systems use several (usually three) hypercardioid microphones to provide omni-directional characteristics. The third approach is provided in a desktop teleconferencing system. In this approach, a unidirectional microphone is mounted on top of a PTZ camera, which points at the speaker. The camera/microphone group is controlled by a computer that uses a separate group of microphones to perform sound source localization. This approach, however, requires two separate sets of microphones.