The present invention relates to conference systems and more specifically various methods and systems for using augmented and virtual reality to enhance conferencing activities including communication and content sharing.
Hereinafter, unless indicated otherwise, the term “meeting” will be used to refer to any gathering or linkage between two or more people in which the people communicate with each other including but not limited to conferences, gatherings, etc., regardless of whether or not all of the people that participate are collocated or not (e.g., one or more of the people in a meeting may be remotely located and linked into the meeting via a phone, video conference system, or other communication device). In addition, the term “attendee” will be used to refer to any person that communicates with another person or persons in a meeting.
Years ago there was a strong belief, and in some cases the belief still persists today, that the best way for one person to communicate with another or with a group of other people is via an in person face-to-face meeting. In addition to enabling meeting attendees to develop personal relationships, there are several other advantages associated with face-to-face meetings. First, face-to-face meetings enabled all attendees to use both hearing and sight senses to discern what other attendees are attempting to communicate. To this end, as well known, in many cases a person's actions, posture, facial expressions, etc., that can be visually observed by others, belie their words or at least provide a deeper meaning to those words such that true or better informed communication requires both visual as well as voice communication.
Second, the natural feedback afforded by both audio and visual senses allows an attendee, if he is paying attention, to ascertain the effects of his own communications on other attendees. Thus, for instance, after a first attendee makes a statement about something, the first attendee can visually and sometimes audibly sense reactions by other attendees to determine (i) if and which other attendees are paying attention to the statement, (2) if and which of the other attendees are in agreement or not in agreement with what was said, and (3) if and which other attendees understand or fail to understand what was said. Here, the visual feedback in many cases is multifaceted and may include a sense of where other attendees focus their attention, facial expressions and even body language, along with audible communications including words as well as non-language audible utterances (e.g., a grunt, a sigh, etc.).
Third, a person's simple presence at a meeting has the effect of demanding attention. In this regard, think of the different sense of presence one has when sitting in a room with another person as opposed to talking to that person via a voice phone call. When a person is present, other attendees are more respectful of their time and give greater attention and less divided attention to their communications.
Fourth, where a person is communicating with multiple attendees at the same time as in a multi-attendee team meeting, there is a dynamic between attendees that can only be sensed as a whole by sensing how all or subsets of attendees are acting and interacting during a meeting, even in the case of attendees that are not currently primarily active (e.g., attendees that are simply listening to others voice communications. Thus, for instance, while in a meeting there may be a sense from overall activity and non-verbal communications that most if not all attendees are in agreement, disagreement, a state of confusion, etc., that cannot be discerned without an overall sense of what is happening in the conference space.
Fifth, where attendees are sharing content in some tangible form such as documents or even content on digital electronic or emissive surfaces, which content attendees are paying attention to is an important form of communication. For instance, assume three large common content sharing emissive surfaces are located in a conference space. At a first time during a meeting assume all of six local attendees are looking at content on a first of the emissive surfaces even though different content is presented on all three surface. At a second time, assume that only one of the attendees is looking at content of the first emissive surface, two attendees are looking at content on the third surface and the fourth and fifth attendees are looking at the sixth attendee while the sixth attendee is looking toward content on a document in her hand. Clearly, simply sensing what and whom attendees are looking at is extremely informative about what is going on in a conference space and makes collocation particularly valuable.
While face-to-face collocated communications are still considered extremely important in many instances, two developments have occurred which have substantially reduced the percentage of total person-to-person communications where attendees are collocated. First, many companies are extremely large and employ people in many different and geographically disparate locations so that communications which used to be with a colleague down the hall or in an adjacent building on an employer's campus are now between employees in different states, counties and even continents. Disparate employee locations have made face-to-face communications cost prohibitive in many cases.
Second, technology has been developed that operates as a “substitute” for in-person meetings. Here, the term “substitute” is in quotations as, in reality, existing technology is a poor substitute for in person collocated meetings in many cases for several reasons.
The first real breakthrough in communication technology that had a substantial impact on the prevalence of collocated meetings was in phone communication systems where audible phone calls and conferencing computers and software enabled remote meeting attendees to have an audio presence for hearing spoken words as well as for voicing their own communications to one or several local phone conference attendees. Phone conferencing hardware and software has become ubiquitous in many offices and other employment facilities and especially in conference spaces fitted out to support multiple local employees as well as in private offices.
While voice phone systems have been useful and have reduced person-to-person communication costs appreciably, phone systems have several shortcomings. First, in phone systems, all the benefits of visual feedback during communication are absent. Instead of relying on visual feedback to assess meaning, attention level, level of understanding, group thinking, etc., a phone-linked meeting attendee has to rely solely on audio output. Inability to perceive meaning, attention level, understanding and other telltale signs of communication success are exacerbated in cases where there are several (e.g., 8-12) local attendees and even other remote attendees on a phone call where attendees may have difficulty discerning who is talking, when it is appropriate to talk (e.g., during a lull in a conversation, etc.
Second, in many cases audio for a remote is provided by a single speaker or a small number of speakers (e.g., 2 on a laptop) where there is little if any ability to generate any type of directional sound (e.g., sound coming from any one of several different directions toward a remote attendee). Thus, here, any time any of 12 local attendees makes a comment, the remote attendee hears the comment from the one speaker or non-directional speakers at her location and is not able to rely on the direction of the sound to discern who is currently speaking or to distinguish one voice from others.
In part to address the shortcomings associated with phone systems, a second technological development in communications aimed at reducing the need for collocated meetings has been the addition of video to audio conferencing systems. Here, the idea is that remotely located meeting attendees use cameras to obtain video of themselves which is transmitted to and presented to other differently located attendees along with audio or voice signals so that employees can, in effect, see and hear each other during a meeting. In some cases video conferences may be set up between only two disparately located attendees and, in these cases, cameras are typically positioned along an edge (e.g., a top edge) of a video conferencing display or emissive surface at each attendee's station and are aimed from that edge location directly toward the attendee at the station. The video at each station is transmitted to the other remote attendee's station and presented on the emissive surface display screen adjacent the edge located camera at the receiving station.
In other cases several local attendees may be collocated in a conference room and a remote attendee linked in via video conferencing, may be located alone at a personal workstation. Here, in most cases, an emissive surface or display screen is presented in the local conference space for presenting a video representation of the remote attendee and a camera is arranged adjacent an edge (e.g., a top edge) of the emissive surface that presents the video of the remote attendee, the camera directed toward the local area to obtain video of all of the local attendees in that area. Thus, the remote attendee has one perspective view of all local attendees in the local area from a location along the edge of the surface on which the representation of the remote attendee is presented. The local attendees see a view of the remote attendee from the perspective of the camera located along the edge of the remote attendee's emissive surface.
Hereinafter, unless indicated otherwise, a remote attendee's large field of view of a local conference space or area will be referred to as a “local area view” while a view of a remote attendee from the camera located along the edge of an emissive surface at the remote attendee's station will be referred to as a “station view”. Here, a “station view” may be had by a second remote attendee viewing a first remote attendee or via local attendees at a local conferencing area viewing a remote attendee.
Thus, in each of the station view and the local area view, because the cameras are offset along the edges of the emissive surfaces where those views are presented, attendee's in those views appear to stare off into space as opposed to looking directly at other attendees observing those views. Thus, for instance, where first and second remote attendees are videoconferencing, as the first remote attendee looks at the representation of the second attendee on his station's emissive surface, the image of the first remote attendee that is presented to the second shows the first remote attendee with a ST that is misaligned with the camera at her station and her image at the second attendee's station is therefore misaligned with the ST of the second attendee. Similarly, as the second remote attendee looks at the representation of the first attendee on his station's emissive surface, the image of the second remote attendee that is presented to the first shows the second remote attendee with a ST that is misaligned with the camera at his station and his image at the first attendee's station is therefore misaligned with the ST of the first attendee. Unless indicated otherwise, this phenomenon where attendee sight trajectories are misaligned when obtained with edge positioned cameras will be referred to herein as “the misaligned view effect”.
Video conferencing systems, like voice conferencing systems, have several shortcomings that are often a function of which end of a conference an attendee is linked to, a remote single attendee end or a multi-attendee local conference area end. From the perspective of a remote attendee linked to a multi-attendee conference space, there are at least four shortcomings.
First, for various reasons, remote attendees have a very difficult time discerning whom or what other attendees that participate in a meeting, both local and other remote attendees, are looking at or paying attention to. For instance, while a remote attendee's local area view often times enables the remote attendee to determine the general sight trajectories (e.g., the direction in which an attendee is looking) of local attendees, in known cases, it is difficult at best for a remote attendee to understand exactly whom or what a remote attendee is looking at (e.g., cannot discern local attendee's sight trajectories). Thus, for instance, if first and second local attendees are adjacent each other along a right edge of a tabletop in a local conference space and a third local attendee is across from the first and second local attendees on the left edge of the tabletop as presented in the remote attendee's local area view, the remote attendee may have difficulty determining which of the first and second attendees the third attendee is looking at. This inability to discern local attendee sight trajectories is further complicated where the number of local attendees increases. As another instance, if a first local attendee is looking at a second local attendee that resides behind a third local attendee, the first attendee's sight trajectory is difficult at best to discern in a remote attendee's local area view.
As another example, in many cases other information like, for example, a station view of a second remote attendee at his workstation is presented immediately adjacent or near the station view of a first remote attendee's station in a local conference space and therefore it is difficult at best for a remote attendee to determine, based on the remote attendee's local area view, whether or not any local attendee is looking directly at the remote attendee or looking at some other adjacent information (e.g., the second remote attendee). Here, a remote attendee may mistakenly have a sense that a local attendee is looking directly at the remote attendee when in fact she is looking at other information posted adjacent the emissive surface that presents the view of the remote attendee. The inability to discern whether or not local attendees are looking directly at a remote attendee is exacerbated by the misaligned view effect which causes video of attendees to show them looking off into space generally as opposed to at a viewer of the video.
As yet one of other instance, where at least first and second remote attendees link into a single local conference, no known system enables the first remote attendee to detect whom or what the second remote attendee is looking at. Here, in known configurations, the first remote attendee may have a head on view of the second remote attendee with a misaligned view effect and the second remote attendee may have a head on view of the first remote attendee with a misaligned view effect, but neither of those views enables either the first or second remote attendee to discern what the other is viewing. For instance, the second remote attendee may be viewing a local area view of a conference space that is adjacent z station view of the first remote attendee and, in that case, the first remote attendee would have difficulty discerning if the second remote attendee is looking at the local area view or the view of the first remote attendee that is presented to the second remote attendee.
Second, while the camera that generates the remote attendee's local area view is purposefully placed at a local conference room location at which video generated thereby should pick up representations of all local attendees, often times and, in great part because of local attendee preferences on where to arrange their chairs in the local space and where to fix their sight trajectories, the remote attendee cannot view all local attendees much of the time or, at most, has a skewed and imperfect view of many of the local attendees. Thus, for instance, where a first local attendee pushes her chair back 2 feet from an edge of a conference table while a second local attendee is up against the conference table edge and located between the camera and the first local attendee, the view of the first attendee in the remote attendee's conference area may be completely or at least partially blocked. Many other scenarios may result in one or more local attendees being hidden in the remote attendee's local area view.
Third, in many cases the quality of video generated for the remote attendee's local area view is too poor for a remote attendee to perceive or comprehend many non-verbal communication queues. For instance, where a local area view from an end of a conference table includes 12 local employees arranged about the table, the video is often too poor or representations of each employee are too small for the remote attendee to discern facial expressions or even body language. Inability to fully perceive communication like a local attendee places the remote attendee at a distinct communications disadvantage. While a local attendee can sense if there is general agreement on a point in the local space, for instance, the remote attendee often cannot. While a local attendee can sense if other attendees understand a position or an argument, the remote attendee often cannot. Here, the remote attendee may appear to be somewhat tone deaf when compared to local attendees that have the ability to be more empathetic and sensitive.
Fourth, while some systems enable a remote attendee to adjust her local area view at least somewhat, the process required to adjust the view is typically manual and burdensome (e.g., manipulation of a joystick or several directional buttons and zoom buttons, etc.). For this reason, in most cases, remote attendees simply accept the problems associated with the wide angle local area view and forego making any changes thereto during a meeting or, at most, may make one or two changes to zoom in on specific local speakers where those speakers talk for extended periods.
From the perspective of local attendees at the local conference space, a primary problem with existing system is that local attendee views of remote attendees are such that the local attendees have no ability to discern whom or what remote attendees are looking at. In this regard, because the remote attendee's local area view often comprises the entire local area and includes several local attendees, the representations of the local attendees are relatively small in the local area view and therefore when the remote attendee shifts her eyes from one local attendee to another, the shift is difficult to detect in the station view presented to the local attendees. The misaligned view effect exacerbates the problem of detecting a remote attendee's sight trajectory.
Second, where a second remote attendee is linked to a session and video of the send attendee is presented adjacent the local area view, there is no way for local attendees to visually determine when a first remote attendee is looking at the second remote attendee.
Third, station views of remote attendees are often better than real life views of local attendees which can lead to disparate ability to present ideas and content. To this end, in many cases remote attendee representations in local conference areas are on centrally located emissive surfaces optimized for viewing from all locations in the local space. The central presentation of a remote attendee is typically better viewed by most local attendees than are local attendees which results in presence disparity.
A third technological development in communications aimed at reducing the need for face-to-face meetings has been software and systems that enable storage and sharing of digital content in local conference spaces and, in particular, with remotely linked meeting attendees. Thus, for instance, WebEx software and other software packages akin thereto have been developed to enable content and application sharing on multiple display screens for content presentation and development purposes. In many cases content sharing software has been combined with video conferencing systems so that remote and local conferees can share and develop content at the same time that they visually and audibly communicate.
While digital content sharing is invaluable in many cases, such sharing often exacerbates many of the problems described above with respect to video conferencing and presence disparity. To this end, content shared on large common display screens in a local conferencing space present additional targets for local attendee sight trajectories and make for more complex environments where presence disparity between local and remote attendees is exacerbated. For instance, all local attendees have the ability to determine which of three large common emissive surfaces and even which sections and hence which content subsets on which emissive surfaces each of the other local attendees is looking at. In known systems a remote attendee has no way of discerning which common surface, much less which content subset on a common surface that local attendees are instantaneously looking at. Similarly, where representations of one or all of the locally shared content is presented to a remote employee, in known cases there is no way for local attendees to discern what remote attendees are looking at (e.g., which content, a representation of another attendee, etc.).
In addition to the problems with video conferencing and content sharing described above, there are other shortcomings with known systems. First, in most cases a remote attendee is limited in her ability to select views into a local conference space. For example, in most cases video of remote attendees is placed on a stationary emissive surface at one location in the space where a camera is located along the edge of the emissive surface so that the remote attendee's view into the space is limited to the camera location. Depending on where local attendees locate in the conference space and which local attendees locate where in the space, the remote attendee's view may be very good or poor or anywhere in between. Thus, for instance, if a primary presenter locates directly across from the camera that obtains the video provided to the remote attendee, the view may be very good but if the camera is directed at a side of the presenter, the view may be poor. Here, while local attendees can select and assume a best position option for viewing in the local space, in most cases remote attendees do not have that option.
Second, in most cases, local attendees have no ability to move the video representation of a remote attendee to some optimal location. Again, in most cases, the emissive surface that presents the remote attendee representation is stationary and therefore there is no option for repositioning the remote attendee representation.
Third, known systems provide only minimal ability to augment attendee video representations. For instance, in some cases the location of a remote attendee or the remote attendee's name may be presented below, above, etc., the video representation of the attendee so that others viewing the video can identify the attendee or the attendee's location. Location and identity represent minimal attendee associated content.
Fourth, in many cases attendees are associated with a large amount of “additional information” which can be used to add value to a meeting. A simple example of “additional information” is an attendee's name and title at a company or the attendee's current location. More complex additional information may include names and descriptions of projects an attendee is currently associated with or was associated with in the past, documents (e.g., text, graphical, images) the attendee is associated with (e.g., authored, previously presented, is mentioned in, etc.), multimedia materials the attendee is associated with, an attendee's resume or list of experiences, an attendee profile, an attendee's past, current or future schedule, an attendee's contact information, etc. Similarly, content shared among attendees may also have a set of related “additional information” which may add value in a meeting such as, for instance, the name(s) of an author or a person that generated the content, a history of the content or content development, links to other documents or content related to the content, etc. Where additional information is associated with attendees or with shared content, that information could be used to add value in meetings in many different ways which simply are not contemplated by known meeting and content sharing systems.
Thus, there is a need for a substantially better meeting and content sharing system that limits or even eliminates the presence disparity between local and remote meeting attendees in known systems. It would also be advantageous if the system could go beyond eliminating presence disparity to enable even better communication capabilities than those associated with collocated face-to-face meetings.