Modern computing and display technologies have facilitated the development of mixed reality systems for so called “mixed reality” (“MR”), “virtual reality” (“VR”) and/or “augmented reality” (“AR”) experiences. This can be done by presenting computer-generated imagery to the user through a head-mounted display. This imagery creates a sensory experience which immerses the user in the simulated environment. A VR scenario typically involves presentation of digital or virtual image information without transparency to actual real-world visual input.
AR systems generally supplement a real-world environment with simulated elements. For example, AR systems may provide a user with a view of the surrounding real-world environment via a head-mounted display. However, computer-generated imagery can also be presented on the display to enhance the real-world environment. This computer-generated imagery can include elements which are contextually-related to the real-world environment. Such elements can include simulated text, images, objects, etc. MR systems also introduce simulated objects into a real-world environment, but these objects typically feature a greater degree of interactivity than in AR systems. The simulated elements can often times be interactive in real time. VR/AR/MR scenarios can be presented with spatialized audio to improve user experience.
Various optical systems generate images at various depths for displaying VR/AR/MR scenarios. Some such optical systems are described in U.S. Utility patent application Ser. No. 14/738,877 and U.S. Utility patent application Ser. No. 14/555,585 filed on Nov. 27, 2014, the contents of which have been previously incorporated-by-reference herein.
Current spatialized audio systems can cooperate with 3-D optical systems, such as those in 3-D cinema, 3-D video games, virtual reality, augmented reality, and/or mixed reality systems, to render, both optically and sonically, virtual objects. Objects are “virtual” in that they are not real physical objects located in respective positions in three-dimensional space. Instead, virtual objects only exist in the brains (e.g., the optical and/or auditory centers) of viewers and/or listeners when stimulated by light beams and/or soundwaves respectively directed to the eyes and/or ears of audience members. Unfortunately, the listener position and orientation requirements of current spatialized audio systems limit their ability to create the audio portions of virtual objects in a realistic manner for out-of-position listeners.
Current spatialized audio systems, such as those for home theaters and video games, utilize the “5.1” and “7.1” formats. A 5.1 spatialized audio system includes left and right front channels, left and right rear channels, a center channel and a subwoofer. A 7.1 spatialized audio system includes the channels of the 5.1 audio system and left and right channels aligned with the intended listener. Each of the above-mentioned channels corresponds to a separate speaker. Cinema audio systems and cinema grade home theater systems include DOLBY ATMOS, which adds channels configured to be delivered from above the intended listener, thereby immersing the listener in the sound field and surrounding the listener with sound.
Despite improvements in spatialized audio systems, current spatialized audio systems are not capable of taking into account the location and orientation of a listener, not to mention the respective locations and orientations of a plurality of listeners. Therefore, current spatialized audio systems generate sound fields with the assumption that all listeners are positioned adjacent the center of the sound field and oriented facing the center channel of the system, and have listener position and orientation requirements for optimal performance. Accordingly, in a classic one-to-many system, spatialized audio may be delivered to a listener such that the sound appears to be backwards, if that listener happens to be facing opposite of the expected orientation. Such misaligned sound can lead to sensory and cognitive dissonance, and degrade the spatialized audio experience, and any VR/AR/MR experience presented therewith. In serious cases, sensory and cognitive dissonance can cause physiological side-effects, such as headaches, nausea, discomfort, etc., that may lead users to avoid spatialized audio experiences or VR/AR/MR experiences presented therewith.
In a similar technology space, mixed media systems such as those found in theme park rides (i.e., DISNEY'S STAR TOURS) can add real life special effects such as lights and motion to 3-D film and spatialized audio. Users of 3-D mixed media systems are typically required to wear glasses that facilitate system generation of 3-D imagery. Such glasses may contain left and right lenses with different polarizations or color filters, as in traditional anaglyph stereoscopic 3-D systems. The 3-D mixed media system projects overlapping images with different polarizations or colors such that users wearing stereoscopic glasses will see slightly different images in their left and right eyes. The differences in these images are exploited to generate 3-D optical images. However, such systems are prohibitively expensive. Moreover, such mixed media systems do not address the inherent user position and orientation requirements of current spatialized audio systems.
To address these issues, some VR/AR/MR systems include head mounted speakers operatively coupled to a spatialized audio system, so that spatialized audio can be rendered using a “known” position and orientation relationship between speakers and a user/listener's ears. Various examples of such VR/AR/MR systems are described in U.S. Provisional Patent Application Ser. No. 62/369,561, the contents of which have been previously incorporated-by-reference herein. While these VR/AR/MR systems address the listener position issue described above, the systems still have limitations related to processing time, lag and latency that can result in cognitive dissonance with rapid user head movements.
For instance, some VR/AR/MR system deliver spatialized audio to a user/listener through head mounted speakers. Accordingly, if a virtual sound source (e.g., a bird) is virtually located to the right of a user/listener in a first pose (which may be detected by the VR/AR/MR system), the VR/AR/MR system may deliver generated sound (e.g., chirping) corresponding to the virtual sound source that appears to originate from the right of the user/listener. The VR/AR/MR system may deliver the sound mostly through one or more speakers mounted adjacent the user/listener's right ear. If the user/listener turns her head to face the virtual sound source, the VR/AR/MR system may detect this second pose and deliver generated sound corresponding to the virtual sound source that appears to originate from in front of the user/listener.
However, if the user/listener rapidly turns her head to face the virtual sound source, the VR/AR/MR system will experience a lag or latency related to various limitations of the system and the method of generating virtual sound based on a pose of a user/listener. An exemplary virtual sound generation method includes, inter alia, (1) detecting a pose change, (2) communicating the detected pose change to the processor, (3) generating new audio data based on the changed pose, (4) communicating the new audio data to the speakers, and (5) generating virtual sound based on the new audio data. These steps between detecting a pose change and generating virtual sound can result in lag or latency that can lead to cognitive dissonance in a VR/AR/MR experience with associated spatialized audio when the user/listener rapidly changes her pose.
Spatialized audio associated with a VR/AR/MR experience illustrates the cognitive dissonance because a virtual sound (e.g., a chirp) may appear to emanate from a location different from the image of the virtual object (e.g., a bird). However, all spatialized audio systems (with or without a VR/AR/MR system) can result in cognitive dissonance with rapid pose change because all spatialized audio systems include virtual sound sources with virtual locations and orientations relative to the user/listener. For instance, if a virtual bird is located to the right of the listener, the chirp should appear to emanate from the same point in space regardless of the orientation of the user's head, or how quickly that orientation changes.