1. Field of the Invention
The invention is directed toward a system and method for determining the positions of plural objects over time, i.e., to capturing or tracking motion of objects. More particularly, the invention is directed toward an optical motion capture (MC) system and method.
2. Description of the Related Art
Motion capture is one of the hottest topics in computer graphics animation today. What is motion capture? Motion capture involves measuring at least one object's position and orientation in physical space, then recording that information over time in a computer-usable form. Objects of interest include human and non-human bodies, facial expressions, hand gestures, camera or light positions, and other elements in a scene.
Once data is recorded in computer-usable form, animators can use it to control elements in a computer generated scene. Such a scene can be used in a biomechanical analysis, e.g., in a reproduction of a golfer's swing for slow motion analysis from a variety of computer-generated viewing angles by an instructor. Such a scene might also form the basis of an animated sequence in a science fiction movie.
Animation which is based purely on motion capture uses the recorded positions and orientations of real objects to generate the paths taken by synthetic objects within the computer-generated scene. However, because of constraints on mismatched geometry, quality constraints of motion capture data, and creative requirements, animation rarely is purely motion capture-based.
Data from real-time motion capture devices can be used interactively (assuming minimal transport delay) to provide real-time feedback regarding the character and quality of the captured data. Non-real-time motion capture devices either provide data that requires additional post-processing before it can be used in an animation or computer graphic display or provide data that is merely a snapshot of the measured objects.
Motion capture systems reflect a balancing of a number of competing considerations, or tradeoffs. Those tradeoffs are: the number of points that can be tracked, resolution, ease of use, affordability, convenience and flexibility in terms of how readily the system can adapt to different motion capture tasks. The typical motion capture systems are either magnetically or optically based.
MAGNETIC MOTION CAPTURE SYSTEMS
Magnetic motion capture systems use sensors to accurately measure the magnetic field created by a source. Such systems are real-time, in that they can provide from 15 to 120 samples per second (depending on the model and number of sensors) of 6 degree-of-freedom data (position and orientation) with minimal transport delay.
A typical magnetic motion capture system has one or more electronic control units into which the source(s) and 10 to 20 sensors are cabled. The electronic control units are, in turn, attached to a host computer through a network or serial port. The motion capture or animation software communicates with these devices via a driver program. The sensors are attached to the scene elements being tracked. The source is set either above or to the side of the active area. There can be no metal in the active area, because it can interfere with the motion capture.
The ideal approach for magnetic motion capture is to place one sensor at each joint of a body. However, the physical limitations of the human body (the arms must connect to the shoulder, etc.) allow an exact solution with significantly fewer sensors. Because a magnetic system provides both position and orientation data, it is possible to infer joint positions by knowing the limb lengths of the motion-capture subject.
The typical magnetic motion capture session is run much like a film shoot. Careful rehearsal ensures that the performers are familiar with the constraints of the tethers and the available "active" space for capture. Rehearsal often includes the people handling the cables to ensure that their motion aligns to the motion of the performers. The script is broken down into manageable shot lengths and is often story boarded prior to motion capture. Each shot may be recorded several times, and an audio track is often used as a synchronizing element.
Because the magnetic systems provide data in real-time, the director and actors can observe the results of the motion capture both during the actual take and immediately after, with audio playback and unlimited ability to adjust the camera for a better view. This tight feedback loop makes magnetic motion capture ideally suited for situations in which the motion range is limited and direct interaction between the actor, director, and computer character is important.
ADVANTAGES OF TYPICAL MAGNETIC MOTION CAPTURE
A magnetic motion capture system has several advantages. It provides position and orientation information, and so requires fewer sampling locations and less inferred information. Distances and rotations are measured in relation to a single object, the source, so there is less device calibration. Registration with other data requires only a knowledge of the source location (and obviously the measurement accuracy). Real-time interactive display and verification of the captured data is made possible, providing a closed loop model where the actor(s), director and production staff can all participate directly in the capture session. The cost of a typical magnetic system is less than 1/3 to 1/6 of the cost of a typical optical system.
DISADVANTAGES OF TYPICAL MAGNETIC MOTION CAPTURE
A magnetic motion capture system has several disadvantages.
The commercially available magnetic motion capture systems are so sensitive to metal that they cannot be considered office or production environment friendly devices. Care must be taken that the stage, walls, and props for a motion capture session are non-metallic. The maximum effective range of these devices is substantially less than the maximum possible for optical systems, although for longer ranges optical system accuracy decreases linearly (or nearly so).
The subject of a magnetic system is encumbered by cables. The sensors (rather than the sources) are located on the subject(s) and are connected to control units via fairly thick cables to a human subject. The sampling rate is too low for many sports motions. For body tracking applications, magnetic systems tend to have 30 to 60 Hz effective sampling rates. A fastball pitcher's hand moves at roughly 40 meters per second, approximately a meter per sample. Also, filtering is typically used to compensate for measurement jitter, reducing the effective frequency range to 0 to 15 Hz.
OPTICAL MOTION CAPTURE SYSTEMS
Typical optical motion capture systems are based on high contrast video imaging of 20 to 32 markers which are attached to the object whose motion is being recorded. The typical passive markers are retroreflective, e.g., small spheres covered with reflective material. The typical active markers are light emitting diodes (LEDs). A typical active optical system only permits one marker at a time to provide light so as to make it trivial to identify the marker.
The markers of an optical system are typically imaged by standard or HDTV high speed, black and white digital cameras. At a mere 30 frames per second, or 30 Hz sampling rate, the typical system must process between 307200 pixels, for a 640.times.480 camera, and 1,024,000 for a 1280.times.800 camera. Some optical systems increase the frame rate, i.e., the sampling rate, by decreasing the resolution, but that is an undesirable compromise.
The number of cameras used in a typical optical system depends on the type of motion to be captured. Facial motion capture usually uses one camera, sometimes two. Full body motion capture may use four to six (or more) cameras to provide full coverage of the active area. To enhance contrast, each camera is equipped with infrared (IR) emitting LEDs and IR (pass) filters are placed over the camera lens. The cameras are attached to controller cards, typically in a PC chassis.
Depending on the system, either high-contrast (1 bit) video or the marker image centroids are recorded on the PC host during motion capture. Before motion capture begins, a calibration frame, namely a carefully measured and constructed three dimensional (3D) array of markers, is recorded. This defines the frame of reference for the motion capture session.
After a motion capture session, the recorded motion data must be post-processed or tracked. The centroids of the marker images (either computed then, or recalled from disk) are matched in images from pairs of cameras, using a triangulation approach to compute the marker positions in 3D space. Each marker's position from frame to frame is then identified. Several problems can occur in the tracking process, including marker swapping, missing or noisy data, and false reflections.
Tracking can be an interactive and time-consuming process, depending on the quality of the captured data and the fidelity required. For straightforward data, tracking can take anywhere from one to two minutes per captured second of data (at a sampling rate of 120 Hz). For complicated or noisy data, or when the tracked data is expected to be used as is, tracking time can climb to 15 to 30 minutes per captured second, even with expert users. First-time users of tracking software can encounter even higher tracking times.
For a human body (excluding the face), typical setup involves 20 to 32 markers glued (preferably) to the subject's skin or to snug fitting clothing. Markers range from 1 to 5 cm in diameter, depending on the motion capture hardware and the size of the active area. Marker placement depends on the data desired. A single marker is attached at each point of interest, such as the hips, elbow, knees, feet, etc.
A simple configuration would attach three markers to the subject's head on a hat or skull cap, one marker at the base of the neck and the base of the spine, a marker on each of the shoulders, elbows, wrists, hands, hips, knees, ankles and feet, a total of 21 markers. However, if detailed rotational information, such as ulnar roll (the rotation of the wrist relative to the forearm and elbow), is desired, additional markers may be needed. To measure ulnar roll, one approach is to replace the single marker on the wrist with two markers attached as a dumbbell to the wrist.
ADVANTAGES OF TYPICAL OPTICAL MOTION CAPTURE
An optical motion capture system has several advantages. Depending on the system used and the precision required, the motion capture volume can be much larger than for a typical magnetic system. The subject is unencumbered, i.e., not physically attached to the motion capture system. This allows for the long in-run paths (for the subject to get up to speed) and long out-run paths (for the subject to slow down) required for full-speed running motion.
The sampling rate of typical optical systems is fast enough for most sports motions. At a typical sampling rate of 120 to 250 Hz, most human motions are easily measured. However, two classes of motions, pitching (hitting or throwing) and impact, are on the fringes of this sampling range. When throwing a 90 m.p.h. fastball, the human hand travels 33 cm in 1/120 second. For impact events such as drumming, hitting, and hard falling, accelerations may have frequency components well beyond 120 Hz. Thankfully, these motions are a blur for human observers and the loss of accuracy is usually imperceptible.
DISADVANTAGES OF TYPICAL OPTICAL MOTION CAPTURE
An optical motion capture system has several disadvantages. It is three to six times the cost of a typical magnetic system. The costs to operate are also higher, being more similar to film or video production. Current optical systems are contrast based. As such, backgrounds, clothing, and ambient illumination may all present sources of non-negligible noise. Wet or shiny surfaces (mirrors, floors, jewelry, and so on) can cause false marker readings.
In a typical optical system, a marker must be seen by at least two cameras (for 3D data), thus total or partial occlusion caused by the subject, props, floor mats, or other markers, can result in lost, noisy, displaced, or swapped markers. Common occlusions are hand versus hip (standing), elbow versus hip (crouched) or hand versus prop in hand or opposite hand.
Tracking time (required to convert the captured video or centroid information into 3D position data, typically involving filtering and/or data repair) for a typical optical system can be much greater than the time elapsed for the corresponding capture session and may vary unpredictably, depending on accuracy requirements, motion difficulty, and the quality of the raw data captured. Because there is no immediate feedback regarding the quality and character of captured data, it is impossible to know if a given motion has been adequately captured. As such, two to three acceptable takes must be completed to ensure a reasonable probability of success if additional capture sessions are not feasible to acquire missed data.
A typical optical system provides position data only. Joint angles must be inferred by the rays connecting the joint attached markers. Recent developments in tracking software allow the creation of rotational data within the tracking process, removing the position-only restriction from optical data. However, this does add complexity to the tracking process.
A typical optical system is sensitive to calibration. Because multiple cameras are used, the frame of reference for each camera must be accurately measured. If a camera is misaligned (due to partial marker occlusion, or a simple bump of the tripod, i.e., camera mount), markers measured by that camera will be placed inconsistently in 3D space relative to markers measured by other cameras. This is particularly troubling at hand-off, namely the time at which a marker is passed from one camera's field of view into another's, as duplicate points may be created from the same marker or the marker path may jump.