This disclosure relates generally to improving the quality of video produced for viewing on destination displays during video conference calls and reducing human involvement in the process.
Video conferencing has become ubiquitous. Large companies with multiple work sites invest large sum of money to establish private communication networks in each site and between sites of the corporations. The communication networks include packet-based networks or circuit switch networks or both packet-based networks and circuit switch networks.
To establish the private communication networks, large companies distribute a large number of multimedia endpoints throughout the company. Usually, one or more multipoint control units (MCUs) are purchased to serve the internal multipoint multimedia conferencing needs of these endpoints. The MCUs can be installed in one or more different company sites (e.g., at a particular company building, or within a city or region) to generally serve the multipoint needs of the endpoints local to that site. The result is that the various MCUs of the company may be distributed throughout a large region, such as throughout a country or throughout the globe.
As is known in the art, a multimedia endpoint is a terminal on a network. The multimedia endpoint is capable of providing real-time, two-way audiovisual communication with other terminals or an MCU. As is known in the art, an MCU is a conference control entity located in a node of the network or in a terminal. The MCU receives several media channels from access ports. According to certain criteria, the MCU processes audiovisual and data signals and distributes them to the connected channels. Examples of MCUs include those available from Polycom Inc. Additional information about MCUs can be found at the website of www.polycom.com, which is incorporated herein by reference. A more thorough definition of an endpoint (terminal) and an MCU can be found in the International Telecommunication Union (“ITU”) standards such as but not limited to the H.320, H.324, and H.323 standards, which are incorporated herein by reference. (The ITU is the United Nations Specialized Agency in the field of telecommunications. Additional information regarding the ITU can be found at the website address of www.itu.int). The MCU are used in various ways including cascading to establish multi-site video conferences both inside and outside of organizations.
Video conferences between two sites or multiple sites sometimes lack the intimacy and closeness of a TV production. The video streamed to far sites in video conferences is often far from optimal. The video image of the room captured and sent to the far site is typically decided by some camera setting or by the last position of a pan tilt zoom camera. Generally, participants focus on the meeting and not so much on the format and content of the video they are sending to the far sites. The great majority of participants do not direct the camera to focus on the speakers in the room, often leaving the camera pointed at blank space, someone rustling papers, or a far-away view of the speaker.
Pointing the camera is still a manual operation and usually the camera may be zoomed out all the way so that everyone is in the picture, with little regard to a close up that shows people's expressions clearly to the far site. The viewing experience if far from optimal at the receiving end of a conference when little or no attention is given to showing the participants who are talking or engaged in discussions.
This occurs for several reasons, including that many participants are unwilling or unable to operate the camera guidance systems and because most participants are not trained on use of the camera guidance systems. Further, when participants take the time and attention to direct the camera, their attention is drawn away from the subject matter of the conference. Assigning an extra staff person on-site to sit through the conference simply to direct the camera is cost prohibitive, inefficient, and can be ineffective if the staff person is not familiar with the subject matter, status of the participants, and people involved.
Tracking cameras may be used (such as those sold be Polycom) that locate people and track them via their voice and their faces. This is better than leaving the camera still, but the quality is not near what one would get with a TV production crew filming the meeting or event.
Video conferencing with a 360 degree camera or other types of cameras located in the middle of the room is now possible. Often circular or oval seating arrangements are used. Circular seating arrangements in video conference rooms provide an advantage in allowing participants to interact and communicate more comfortably with everyone in the room. In fact, in a conference room where people meet in a circle, they are able to interact with each other better than in a traditional rectangular conference room. Each person can see other individuals in the room equally without having to turn their heads to see someone in the circle. To capture this interaction for video conferencing, a 360 degree or like camera is placed in the middle of the room. When two people in the local conference room engage in a discussion, there is a need for the camera to capture both people at the same time even though they may not be seated next to each other. Preferably, there is proper positioning of the two speaking individuals in the composed video for the conference. These video conferencing systems face problems when participants are not looking at the camera.
Currently to perform the task of composing video for a meeting of individuals seated in an oval circle, multiple camera operators are needed to frame the talkers properly and expensive video switcher and mixing equipment is needed to composite the two camera images together. A human director is needed to determine which side of the screen to place the talkers so that they will appear to be talking towards each other. These problems also exist for people seated in a rectangular arrangement.
Also, in a conference room with a 360 degree panoramic camera, the video system has two video streams: an active talker window (or region of interest) and a panoramic view of the room. Currently these two video streams are transmitted separately to a far site and generally displayed separately. To combine these two video streams in a useful manner requires a user to manually place the strip somewhere on a video layout.
What is needed is a system and method to process video with at least less human involvement, if not little or no human involvement. What is needed is a system and method for automatically processing video for video conferences using sensor data. What is needed is an automated video production crew. What is needed is an automatic system for processing video from 360 degree cameras and from 360 degree panoramic cameras.