The present invention relates generally to the field of character animation, and more particularly to a method of and system for animating a character using a combination of motion capture and virtual model or “rag-doll” physics.
Many video games, including sports titles, use motion capture (mo-cap) data as the source of animation for character models. In most video games, a game engine runs according to the rules of the game taking into account user input and presenting an animated display that is responsive to the user input. For example, if the user presses a button that is a “jump” button according to the game rules, then the game engine would animate a character such that it appears to jump.
The display of a video game is generally a video sequence presented to a display capable of displaying the video sequence. The video sequence typically comprises a plurality of frames. By showing frames in succession in sequence order, simulated objects appear to move. The game engine typically generates frames in real-time response to user input, so rendering time is often constrained.
As used herein, “frame” refers to an image of the video sequence. In some systems, such as interleaved displays, the frame might comprise multiple fields or more complex constructs, but generally a frame should be thought of as a view into a computer-generated scene at a particular time or short time window. For example, with 60 frame-per-second video, if one frame represents the scene at t=0, then the next frame would represent the scene at t= 1/60 second. In some cases, a frame might represent the scene from t=0 to t= 1/60, but in the simple case, the frame is a snapshot in time.
A “scene” comprises those simulated objects that are positioned in a world coordinate space within a view pyramid, view rectangular prism or other shaped view space. In a common approach, the scene comprises all objects (that are not obscured by other objects) within a view pyramid defined by a view point and a view rectangle with boundaries being the perspective planes through the view point and each edge of the view rectangle, possibly truncated by a background.
The simulated objects can be generated entirely from mathematical models describing the shape of the objects (such as arms and a torso described by a set of plane and/or curve surfaces), generated from stored images (such as the face of a famous person), or a combination thereof. It should be noted that if a game engine (or more specifically, a rendering engine that is part of the game engine or used by it) has data as to where each object or portion of a flexible object is in a scene, the frame for that scene can be rendered using standard rendering techniques so the more relevant aspect of a game is how to determine where each object is in the scene so that the rendered video sequence is appropriate.
A scene may comprise several objects with some of the objects being animate in that the objects appear to move either in response to game engine rules or user input. For example, in a basketball game, a character for one of the basketball players might shoot a basket in response to user input, while a defending player will attempt to block the shooter in response to logic that is part of the game rules (e.g., an artificial intelligence component of the game rules might include a rule that defenders block shots when a shot attempt is detected) and when the ball moves through the net, the net will move in response to the ball. The net is expected to be inanimate, but the players' movements are expected to be animate and natural-appearing. Animate objects are typically referred to herein generically as characters and in specific examples, such as animation of a football, soccer, baseball, basketball, or other sports game, the characters are typically simulated players in the game. In many cases, the characters correspond to actual sports figures and those actual sports figures might have contributed motion capture data for use in animating their corresponding character. Players and characters might be nonhuman, simulated robots or other character types.
Animation is the process of generating successive scenes such that when the corresponding frames are displayed in sequence, characters or other objects in the scene appear to move. Where the character represents a life form, it preferably moves in a natural-looking manner.
Generally, movement (or more precisely, the animation of simulated movement) can be inanimate movement or animate movement. In many cases, inanimate movement, such as the movement of a net in response to receiving a basketball, can be simulated using a physics engine that determines movement based on interactions between simulated objects. Animate movement is more complicated, as users and viewers of the video game animation expect natural movement and with some characters, especially simulated human beings, it is difficult to convey natural movement in real-time response to user input.
One well-known approach to conveying natural movement is to use motion capture (mo-cap) data. In a typical process, actual physical characters, such as sports players, have sensors attached to their bodies and proceed through various motions and a computer captures movement of the sensors. For example, a professional football player may perform a run, catch, kick or other move while wearing sensors and that motion is captured (recorded). A simulated character can then be easily animated to move naturally by having the body parts of the simulated character follow the motion recorded from the motions of the actual physical character.
Mo-cap has its limitations as it only allows for movements that were prerecorded or small variations thereof. Mo-cap provides extremely natural and believable motion but it is not very dynamic in that it is replayed exactly as recorded or slightly modified through warping, blending, or combined with inverse kinematics (IK). As a result, some games are forced to prevent or ignore dynamic situations that may occur in real life. Examples in a football video game include dynamic player-player collisions, dynamic player-player pile-ups, dynamic tackles and the like. Situations like these are often avoided or handled incorrectly at times resulting in unrealistic animation and game-play.
A character is often modeled as a skeleton comprising a plurality of body parts with joint constraints. Joint constraints might include attachment points (some of which might be pivot-able as described elsewhere herein), range of motion, degrees of freedom, masses, and possibly strength limits and distribution of masses. For example, a football player might be modeled with a torso, upper arms, forearms, head, waist, legs, fingers, etc., with the upper arm constrained to remain joined to the forearm at the elbow with less than 180 degrees of joint movement at the elbow. A skeleton can thus be represented in game data structures as a collection of body part data structures and a collection of joint constraints. A skeleton in motion might further include as part of its state the positions/orientations of skeleton parts, velocity/angular moment and a set of force/torque vectors on some or all body parts. A skeleton data structure might include data to hierarchically link body parts, such as pointers to parent and child body parts.
Motion capture data can be stored as a description of where each body part is at a given frame time, but motion capture data is commonly stored as descriptions of joint angles between joined body parts. The skeleton is often stored as a hierarchical structure, with some body part being the “root” part and each of the other body parts being connected directly or indirectly to that root part. Each body part might also have an associated mass and mass distribution with that information stored as part of the skeleton data structure. A position of the body parts is often referred to as a “pose”. Thus, in a given frame, a character has a particular pose. With mo-cap data, the pose might be one specified by the mo-cap data.
Other approaches to animation have been tried. One such approach is rag-doll physics. In this approach, a character is modeled by a skeleton with joint constraints and the body parts move according to physics rules. A physics engine might operate on skeletons to indicate movement based on the laws of physics. For example, the input to a physics engine might include data structures representing one or more character skeletons, a current position of each of the body parts of the skeletons, a current velocity of each of the body parts, and a set of external force vectors and from those inputs, the physics engine would output a new position of each of the body parts of the input skeletons. The external forces might include gravity, resistance, and the impact forces from other characters or other objects in the scene. This is referred to as rag-doll physics because in the absence of forces, the character acts as a limp body and floats in space (or in the presence of a gravity force, just flops to the ground like a rag-doll).
Examples of physics engines include those provided by Havok and Touchdown Entertainment. Using this approach, unscripted physics-based animation can interact correctly with a dynamic environment. Unfortunately, rag-doll physics does not solve the age old problem of generating natural appearing human motion. By itself, it only works well on dead or unconscious characters or in cases where the external forces are so large that they overpower muscular forces.
Some approaches to combining motion-capture controlled animation with rag-doll physics have been tried, but tend to lose realism in the process. For example, some physics engines might use “powered joint constraints” wherein joints are simulated with motors attached to the joints that attempt to move the limbs attached at the joint to a target orientation. This is difficult to match exactly and much of the fidelity of the mo-cap movement is lost.