An age-old predicament in the field of human-computer interaction is that multi-dimensional virtual spaces (a.k.a. virtual environments, “VEs”) are far more difficult for people to navigate than the real world. The VE's navigational interface is an important factor because it dictates the type and scope of body-based feedback provided to a participant. The Association for Computing Machinery, Inc. (“ACM”) research paper Walking improves your cognitive map in environments that are large-scale and large in extent (“Ruddle”) provides an informative overview of historical research and theoretical foundations [Roy A. Ruddle, Ekaterina Volkova and Heinrich H. Bülthoff, ACM Transactions on Computer-Human Interaction, v.18 n.2, p.1-20, June 2011].
A particular design challenge arises when crafting natural interactions for systems in which participants are to be afforded control over both location and orientation in a VE. This is due to the fact that there are a limited number of ordinary gestures that translate to intuitive proprioceptive and vestibular relations between people, objects and the environment. Any control variables not readily mapped to natural body movements are therefore mapped to artificial user interfaces, such as joysticks, keyboards, multi-touch gestures and on-screen buttons and slider controls.
FIGS. 4A-5F illustrate a range of primitive motions readily sensible by handheld computing devices. The difficulty lies in picking the right combination of sensor mappings to compose an aggregate set of affordances that provide for natural and rewarding human-computer interaction. The objectives of the present invention prioritize participant motions for peripatetic locomotion control and side-to-side orientation control over up-and-down orientation control.
Virtual reality (“VR”) applications have demonstrated different sets of tools and interaction methods in association with different types of devices for traversing fictional and/or simulated environments. While VEs on desktop computers provide almost no body-based information, head-mounted audio-visual displays (“HMDs”) replace the sights and sounds of the physical world with those of a virtual world. Movement of the participant's head rotations around the x-axis (looking up/down), the y-axis (turning head left/right) and the z-axis (cocking head left/right)—may be detected by the system and used as input to respectively influence the drawing of the elements of the virtual world on a visual display from the perspective of a virtual camera that tracks with the participant's eyes. The body's own kinesthetic awareness in motion furnishes proprioceptive and vestibular cues—boosting the participant's navigational sense.
Participants' control of their apparent locomotion within the VE has been accomplished using physical locomotion (i.e. literally walking through physical space), hand gestures (e.g. pointing in a direction to move), as well as directional and omnidirectional treadmills. Treadmills enable a 1:1 translation of directional participant movement into virtual locomotion. This is advantageous because physically moving the body supports participants' formation of mental models of distances traversed in the VE. Such interaction techniques have been shown to improve the accuracy of participant cognitive/spatial maps.
Ruddle reported on the effect of rotational vs. translational body-based information on participants' navigational performance (distance traveled) and cognitive mapping (direction and straight line distance estimates). Ruddle's research suggested that physical locomotion for spatial translation was not only more important than physical rotation in establishing participants' sense of where they are and where they have been in VEs, but that rotational body-based information had no effect on the accuracy of participants' cognitive maps over using a joystick. Ruddle teaches away from the invention disclosed herein by prioritizing physical locomotion over physical rotation.
Augmented reality (“AR”) applications have demonstrated sets of tools and interaction methods for bridging the physical and virtual worlds, overlaying fictional and/or modeled objects and information elements over live views of the physical world. This has typically been accomplished by adding a real camera to the device, enabling the device to composite rendered objects into or over the scene captured by the real-world camera. On mobile phones and tablet computing devices so configured, the participant's movement of the device drives the display of content. As the viewfinder of a camera tracks with the location and orientation of the camera's lens, the displayed environment and augmented objects and information on the device may track with the movement of the handheld computing device.
Camera-based AR applications on handheld computing devices naturally rely upon sensing rotation around a device's y-axis (as in FIGS. 4C and 4D) and rotation around the x-axis (as in FIGS. 4A and 4B) for enabling participants to fully rotate the camera left, right, down and up. And because AR applications rely upon the physical environment as a framework, movement though the augmented space generally depends on a participant's physical location in real space (leveraging global positioning satellite (“GPS”) data and magnetometer sensors). FIGS. 5E and 5F illustrate a participant moving a device backward and forward in physical space which could translate to moving backward and forward, respectively, in augmented space.
A related class of application are stargazing astronomy guides. Stargazing app interactions operate like AR app interactions, displaying labeled stars, constellations and satellites that correspond with the location (on Earth) and posture of the handheld device. Some stargazing apps operate in an AR mode, superimposing stellar information with a live camera feed from the device. Others forgo the camera feed to dedicate the entire visual display to stellar information. A participant located in Austin, Tex. sees representations of celestial objects in virtual space relative to their location in Austin. If, however, the participant desires to see the celestial objects as they would be seen from San Francisco, Calif., the participant would need to travel to San Francisco. This is impractical in the span of a single participant experience. Yet, the interaction mappings employed by AR applications seem to logically preclude virtual locomotion based on device posture. Since pivot up and down, for example, are used to look up and down in the sky, pivot up and down might logically preclude being mapped to locomotion backward and forward.
Physical locomotion can be impractical when using a virtual environment application on a handheld computing device—especially when operated in a space physically smaller than the virtual dimensions of a VE. Imagine, for example, walking around your living room in order to travel isometrically through a virtual museum. Your very real furniture and walls would present obstacles almost certainly not correlated to the galleries, corridors and three-dimensional sculptures available to be explored in the VE. Thus, a challenge is to develop techniques for providing participants as much body-based (proprioceptive and vestibular) sensory information as possible in the context of non-physical-locomotion-based interfaces for successful path traversal and path integration (i.e. cognitive mapping of a space based on navigational movements).
First-person shooter (“FPS”) games have made use of a subset of VR techniques, modeling a virtual world for a participant to traverse while attempting to kill the inhabitants of said virtual world. The rendered world is generally drawn from the perspective of the eyes of the protagonist, with the protagonist's weapon portrayed at the bottom of the visual display. As with VR applications, the camera tracks with the facing and point-of-view of the protagonist. Screen real estate is a limiting factor in the design of controls for FPS games on handheld devices, thus solutions that avoid touchscreen interactions are advantageous.
A “rail shooter” or “on-rail game” is a similar type of game where participants cannot, however, control their direction of travel through the virtual environment—as if the course is confined to a fixed rail. A limited set of choices promise a choose-your-own adventure story, but the participant can neither deviate from the course nor backtrack along the way. Thus, the experience commonly focuses on shooting. Point of view is first-person or from just behind the protagonist, with a phallocentric gaze looking down the barrel of a gun as in FPS games. The participant does not need to worry about movement and generally does not have control over the camera.
In less-restrictive games, a participant may be afforded freedom to control both the location of a protagonist in space and the orientation of the protagonist's view. In such instances, the computer modifies both the coordinates and facing of the virtual camera in the virtual space to render the scene on the visual display. On handheld computing devices, motion sensors have been used to enable participants to aim weapons and simultaneously adjust the viewing orientation in the space. Movement through the virtual space, however, has generally been limited to on-screen directional controls.
Driving games on handheld devices have been designed to simulate driving real cars in a VE, providing both a physical interface for rotation and techniques for virtual locomotion not based on physical movement. Such games often use accelerometer and/or gyro sensors to detect rotation of a device around the z-axis (i.e. like an automobile steering wheel as in FIGS. 4E and 4F) for steering a virtual racecar. But despite the direct and seemingly obvious analogy to steering a real automobile, the present inventors have observed participants attempting to steer a virtual racecar in a racing game by swinging the device left or right around their own bodies. This performance error seems surprising, especially on the part of skilled drivers. While unsolicited, such reflex behaviors hint at the intelligence of body rotation as an actuator for rotation in more “intuitive” VEs.
SEGA Corporation's Super Monkey Ball 2: Sakura Edition is an example of a sensor-based navigation game that runs on Apple iPhone and iPad devices (collectively “iOS devices”) [available at the time of writing in the Apple iTunes app store via http://www.sega.com/games/super-monkey-ball-2-sakura-edition/]. A participant controls the movement of an animated monkey sprite enclosed in a translucent ball (a “monkey ball”) through a series of mazes in a VE by pivoting the device simultaneously around two axes. Pivoting up (i.e. rotating the device around its x-axis as in FIG. 4A) causes the monkey ball to roll forward; and the velocity of the monkey ball is related to the degree of pivot down from an origin. Pivoting down (i.e. rotating the device around its x-axis as in FIG. 4B) while the monkey ball is rolling forward causes the monkey ball to slow down. Pivoting down while the monkey ball is stationary causes it to turn around and face in the opposite direction. Pivoting right (i.e. rotating the device around its y-axis as in FIG. 4C) causes the monkey ball to rotate right; and pivoting left (i.e. rotating the device around its y-axis as in FIG. 4D) causes the monkey ball to rotate left.
Exemplary patent documents material to the consideration of sensor-based human interfaces for virtual space and video navigation on handheld devices include, but are not limited to U.S. Pat. No. 5,602,566 (“Motosyuku” et al.), WO 98/15920 (“Austreng”), U.S. Pat. No. 6,201,544 (“Lands”), WO 01/86920 A2 and WO 01/86920 A3 (collectively “Lapidot”), WO 03/001340 A2 (“Mosttov” et al.), GB 2378878 A (“Gaskell”), U.S. Pat. No. 7,631,277 (“Nie” et al.), U.S. Pat. No. 7,865,834 (“van Os” et al.), U.S. Pat. No. 7,688,306 (“Wehrenberg” et al.), WO 2008/094458 A1 (“Cook” et al.) and U.S. patent application Ser. No. 12/831,722 (“Piemonte”). Note that the below summaries are not meant to be exhaustive descriptions of each set of teachings, and the present inventors acknowledge that they may have unintentionally overlooked aspects disclosed that may be relevant to the present invention. Furthermore, these citations are not to be construed as a representation that a search has been made or that additional information may or may not exist that is material or that any of the items listed constitute prior art.
Motosyuku teaches scrolling a two-dimensional document on a display screen in accordance with pivot of a device. Rotation around the device's x-axis (as in FIGS. 4A and 4B) causes the document to scroll up or down. Rotation around the device's y-axis (as in FIGS. 4C and 4D) causes the document to scroll right or left.
Austreng teaches a method of storing and retrieving a series of two-dimensional images of a three-dimensional object taken along different viewing angles. In response to directional input, varying two-dimensional images are displayed, thereby creating the appearance of three-dimensional rotation of the displayed object. Austreng mentions in passing that “it is to be understood that the invention can be used with other digital data, such as digitized video,” but it is unclear how said rotation simulation teachings could relate to video.
Lands teaches a modal use of device pivot to control operations selected from a group consisting of document paging, document zoom, device volume control and device brightness control. Sensor(s) are configured to measure changes in rotation of the device around the device's x-axis (as in FIGS. 4A and 4B) or around the device's y-axis (as in FIGS. 4C and 4D). Variables are changed by an amount proportional to the change in pivot of the device relative to a reference pivot.
Lapidot teaches a modal use of device movement to control selection of one of multiple options, to control panning within a document or to control zoom within a document (i.e. changing the resolution of a displayed image or the size of displayed text or picture). Lapidot teaches sensing movement of the device along the x-axis (as in FIGS. 5A and 5B), along the y-axis (as in FIGS. 5C and 5D) and along the z-axis (as in FIGS. 5E and 5F) using either accelerometers or a camera mounted on the device. Lapidot also teaches ignoring movements measuring below a pre-defined threshold value, and relating the rate of change of control variables with the speed or acceleration of the movement of the device.
Mosttov teaches gesture recognition techniques for a handheld device, discriminating between and prioritizing interpretation of inertial sensor data according to a hierarchy of classes of gestures. One or more discriminators is configured to recognize a specific class of gestures and each discriminator is associated with an interpreter that identifies specific gestures in the class. In one embodiment, if a discriminator detects linear or planar motion, then motion data is transferred to a planer gesture recognizer. But if no linear or planar motion is detected, then motion data may be transferred to a pivot gesture recognizer that determines the direction and degree of pivot of the device.
Gaskell teaches interaction techniques for simultaneously zooming and scrolling a two-dimensional image on a handheld device. The image is enlarged when the device, held parallel to the ground in a horizontal posture, is moved down (i.e. along the device's z-axis (as in FIG. 5F) perpendicular to the ground); and reduced in size or resolution when the device is moved up (as in FIG. 5E). The image is scrolled in any of four directions when the device is pivoted around the device's x-axis (as in FIGS. 4A and 4B) or y-axis (as in FIGS. 4C and 4D). The direction of scrolling corresponds to the direction of pivot, and the speed(s) of effect(s) are responsive to the speed(s) of movement of the device. Gaskell also makes a passing remark about “altering the apparent nature of the horizontal ‘dead band’ in which the moving stops” without further elaboration; and it is unclear what is meant.
Nie teaches techniques for creation of a three-dimensional VE scene containing layers of two-dimensional sprite objects that are displayed in such a way as to appear three-dimensional. Visual representations of each object corresponding to different orientations are assembled from a series of still images, animations or video clips to give each two-dimensional object three-dimensional characteristics. The source content for each sprite can be a single bitmap, a bitmap image sequence, a vector image, a video track, a live stream or a source specified by a universal resource locator (“URL”). Nie is clear that object movies are “not truly movies” and “not truly 3D.”
Nie also teaches manipulation of the VE scene using desktop-computing interaction techniques (e.g. mouse movement, mouse clicking and keyboard input)—to effectuate rotating, panning, pivoting and zooming of objects and/or scenes using three-dimensional data translation vectors and rotation matrices. In this context, Nie teaches associating audio with scenes and objects, such that a soundtrack plays upon participant selection of an object. When the location of an object in the VE is changed, the three-dimensional location of the audio associated with the object may also be changed.
van Os teaches techniques for simultaneously displaying multiple video panes of videoconference streams in a single user interface designed to simulate a three-dimensional VE without the need for participant navigation or manipulation of said simulated VE. Apple first distributed this feature in the application iChat as part of Mac OS X v10.4 (a.k.a. Tiger). One participant hosts a group videoconference with up to three other participants, and everyone in the videoconference sees and hears the other participants. Video panes are displayed with orthographic projection relative to the participant so as to impart a sense of perspective. Side panes are angled inwardly towards a center location and foreground reflections are used to enhance the sense of presence of the participants, as if they are seated around a table. Animation is used to transition between events when participants enter and leave a videoconference; sliding video panes on or off screen. Otherwise, the video panes and the virtual camera remain in fixed locations.
Wehrenberg teaches techniques for performing a variety of functions in response to accelerometer-detected movement and orientation of a handheld device without a participant having to press and/or click buttons. Exemplary functions include reorienting a displayed document, triggering display of a page of a document, navigating an object or document that normally cannot be displayed entirely at once within the visual display of the device, activating/deactivating a device, motion compensation, impulse detection for controlled momentum transfer and other applications based on an accelerometer. With regards to navigating an image, Wehrenberg teaches zooming out in response to a participant pivoting the device up and zooming in when pivoting down.
In gaming contexts, Wehrenberg teaches holding and turning a device like a steering wheel; accelerating a vehicle when pivoting the device up and decelerating when pivoting down; aiming an airplane in a flying game up and down when pivoting up and down; and using pivot to look up, down and/or around. With regards to a VE, Wehrenberg teaches using a handheld device as a “window into a virtual reality image database. For example, a user holding the tablet can turn around and see the view looking backward from a position in a two or three dimensional image or object database as if the user walks into a virtual reality game space.” That said, the present inventors do not believe such an interaction can be accomplished reliably, if at all, using the Wehrenberg taught accelerometer-based techniques due in part to the fact that accelerometers do not separate gravitational and inertial forces.
Cook explains another problem with Wehrenberg's enablement-lacking comment about looking backward: “accelerometers suffer from an inability to detect rotation around the force vector. So, for example, a motion application that depended on measuring rotation of a stationary device around the device's Y axis would work quite well when the device is horizontal, would become less accurate as the angle between the Y axis and the horizontal plane increases, and would become unpredictable as the Y axis becomes aligned vertically with the gravity vector.” To address this problem, Cook uses camera data to help detect changes in orientation.
Cook teaches interaction techniques for circumscribing a virtual object using a handheld device. Pivoting the device (i.e. rotating around the device's x-axis or y-axis, as in FIGS. 4A-4D) controls the angle of view of the image and moving the device perpendicular to the screen (as in FIGS. 5E and 5F) controls the magnification. The result is analogous to orbiting around a real object in the physical world with a camera while looking through the camera's viewfinder. Yet in Cook's case, the visual display shows a virtual object that may not exist in the physical world. When the user moves the device, the view on the display moves; and when the user pivots the device, either the view pivots so that the image is displayed at an angle related to the pivot angle or the image scrolls in the direction of the pivot. A maximum pivot viewing threshold angle may be used to prevent from pivoting past a certain angle and to switch between control modes, such as between pivoting the view and scrolling the view. Cook teaches that pivot angle may be mapped to velocity using a linear, exponential or geometric equation and that “it may be useful to have the viewing angle change more or less than the [pivot] angle depending on usability factors.” Cook also teaches techniques for centering the virtual camera view on a desired center-point of an image, bringing the line of sight perpendicular to that point on the image—using a motion of the device, a button push, a screen tap and/or a voice command.
Piemonte teaches use of orientation data from one or more sensors to navigate a three-dimensional perspective projection without a participant touching the visual display. As the participant pivots the device left or right around its y-axis (as in FIGS. 4C and 4D), the virtual camera view is turned left or right to reveal the left or right sides of a three-dimensional user interface VE, respectively. As the participant pivots the device down or up around its x-axis (as in FIGS. 4A and 4B), the virtual camera view is angled down or up to reveal the floor or ceiling of the VE, respectively. Angular rotation “can be measured or estimated from data provided by gyro sensors, accelerometers, magnetometers or any combination of sensor data that can provide an estimate of the orientation of mobile device relative to a reference axis of rotation.” Piemonte also teaches constraining and scaling sensor data so that small rotations cause small virtual camera view changes while large rotations or motions (such as shaking the device) result in a “snap-to” jump of the virtual camera view to a predetermined orientation.