Virtual reality (VR) allows simulation and training providers to deliver rich and immersive virtual content. Mixed reality blends virtual scenes and real scenes into a single three-dimensional immersive scene. Mixed reality generally utilizes a method of real-time video processing, extracting foreground imagery from background, and generating a blended scene to a user display, which combines desired real-world foreground objects with a virtual background. Mixed reality user training enhances VR by engaging user muscle memory and providing tactile feedback, which are critical components of learning. Mixed reality allows a trainee to handle real equipment, which the trainee would use in the field, and allows for multi-user training scenarios where teammates can see each other in the same three-dimensional virtual environment.
Low-latency video processing is important to a mixed reality system and is especially important to mixed reality systems that utilize occluded displays, such as the Oculus Rift or the Rockwell Collins Coalescence training system, where the user has an opaque display that does not normally allow the user to view the real world. Currently implemented occluded displays for mixed reality typically require separate cameras to provide the real scene image portion of mixed reality images. The real scene images are captured by the cameras, transformed algorithmically, and transferred to the display before the user will see any of it. Humans can detect any significant latency caused by a video processing path, especially with the wide field of view of a head-worn display because a human's peripheral vision is very sensitive to motion. For example, when a user shakes his or her hand in front of his or her eyes, the user's proprioceptive sense tells the user exactly when and where the hand should appear in the user's field of view. If camera and/or display latency is noticeable, the brain detects the lag, which negatively affects hand-eye coordination and can cause disorientation or even nausea. Experimentation has shown that display latencies of more than approximately 20 milliseconds (ms), “photon-to-pixel”, are perceptible and distracting to the user. Latencies of more than 20 ms negate the immersive benefits of mixed reality training. Currently, much of the latency budget of 20 ms is consumed by the camera exposure time, which is typically in a range of 4-15 ms, and frame input/output (I/O) time, which involves transporting the captured frame from the camera to the display. This leaves only a few milliseconds of the 20 ms latency budget to perform any video processing. Typically, video processing requirements are significant because the video processing involves rendering live real scene video that blends cohesively with the virtual scene video. Additionally, such video processing is typically performed on very high-bandwidth video to get a sufficiently high resolution for creating an immersive mixed reality experience.