Studies of the human vision system show that the analysis of dynamic scene involves both low-level processing at the retina and high-level knowledge processing in the brain, see P. Buser and M. Imbert, Vision, translated by R. H. Kay, pp. 137-151, The MIT Press, 1992, Su-Shing Chen, Structure form Motion without the Rigidity Assumption, Proceedings of the 3rd Workshop on Computer. For motion analysis, it has been shown that the human vision system captures both high-level structures and low-level motions of a dynamic scene, see D. Burr and J. Ross, Visual Analysis during Motion, Vision, Brian, and Cooperative Computation, pp. 187-207, edited by M. A. Arbib and A. R. Hanson, The MIT Press, 1987. Unfortunately, current vision and graphics systems do not satisfy this requirement. Popular representations such as those taught by M. Kass, A. Witkin and D. Terzopoulos, Snakes: Active Contour Models, International Journal of Computer Vision, pp.321-331, Kluwer Academic Publishers, Boston, 1988, and D. N. Lee, The Optical Flow Field: The Foundation of Vision, Philosophical Transactions of the Royal Society of London, B290, pp 169-179, 1980, do not enable symbolic or knowledge manipulation. These systems generally lack the capability of automated learning unknown types of motion and movements of modeled objects.
In the fast growing virtual reality society, realistic visual modeling for virtual objects has never been so eagerly needed in its history. Previous modeling techniques mainly look after geometrical appearance or physical features. However, as pointed out in J. Bates, Deep Structure for Virtual Reality, Technical Report, CMU-CS-91-133, Carnegie Mellon University, May, 1991, and G. Burdea and P. Coiffet, Virtual Reality Technology, John Wiley and Sons, Inc., 1994, an ideal object modeling has at least the following requirements: appearance modeling for geometric shapes; kinematics modeling for the rotations and translations of objects; physical modeling of various properties such as the mass, inertia, deformation factors of objects to mention just a few; and behavioral features such as intelligence and emotions.
Similar requirements arise from many Internet applications where there is a fast growing interest in 3D Web contents. Current 3D Web methodologies (such as VRML, see D. Brutzman, M. Pesce, G. Bell, A. Dam and S. Abiezzi, VRML, Prelude and Future, ACM SIGGRAPH '96. pp. 489-490, New Orleans, August 1996) heavily depend on Internet bandwidth to transmit 3D data. Therefore, an efficient 3D representation is required to “compress” the complex and multi-aspect 3D data.
To satisfy all these requirements, there is needed a generic structure to enable symbolic operations for the modeling and manipulation of real and virtual world data with different types of information. The representation of visual object has preoccupied computer vision and graphics researches for several decades. Before the emergence of computer animation, research mainly focused on the modeling of rigid shapes. Despite the large body of work, most techniques lacked the flexibility to model non-rigid motions. Only after the mid 1980's were a few modeling methodologies for solving deformation problems.
An effective object modeling methodology should characterize the object's features under different circumstances in the application scope. In the early research, focus was placed on appearance modeling since the objects involved in vision and graphics applications were mostly simple and stationary. Currently, simple unilateral modeling can no longer satisfy the requirement in dynamic vision and computer animation. As discussed in G. Burdea and P. Coiffet, Virtual Reality Technology, John Wiley and Sons, Inc., 1994, a complete 3D object modeling should ideally comprise at least the following components.    1) Geometrical modeling: this is the basic requirement for any vision or graphics system. It describes an object's geometrical properties, namely, the shape (e.g., polygon, triangle or vertex, etc.) and the appearance (e.g., texture, surface reflection and/or color).    2) Kinematics modeling: this specifies an object's motion behaviors that are vital for dynamic vision and animation. 4×4 or 3×4 homogeneous transformation matrices can be used to identify translations, rotations and scaling factors.    3) Physical modeling: physical modeling is required for complex situations where the object is elastic and/or deformations and collisions are involved. Objects can be modeled physically by specifying their mass, weight, inertia, compliance, deformation parameters, etc. These features are integrated with the geometrical modeling along with certain physical laws to form a realistic model.    4) Behavior modeling: this is the least studied aspect of modeling. In intelligent modeling, an object can be considered as an intelligent agent that has a degree of intelligence. It can actively respond to its environments based on its behavioral rules.
Hereinafter, current modeling methodologies are reviewed and classified into three categories: continuous modeling, discrete modeling and graph-based modeling.
Continuous Modeling
This type of modeling approximates either the whole or a functional part of the 3D object by a variation of geometrical primitives, such as blocks, polyhedrons, spheres, generalized cylinders or superquadrics. These geometrical primitives can be expressed as continuous or piecewise continuous functions in 3D space. Kinematic and physical features can be easily combined with the geometrical shapes. Among the large body of the geometrical primitives, generalized cylinders and superquadrics are the popular ones since they could easily handle deformations. Barr is considered as one of the first to borrow the techniques from linear mechanical analysis to approximate visual 3D objects, see A. Barr, Superquadrics and Angle-preserving Transformations, IEEE Computer Graphics Applications, 18:21-30, 1981, and A. Barr and A. Witkin, Topics in Physically Based Modeling, ACM SIGGRAPH '89, Course Note 30, New York, 1989. He defined the angle-preserving transformations on superquadrics. Although the original approach is only for computer graphics, it is also useful in vision tasks with fruitful results. As a dynamic extension of superquadrics, the deformable superquadrics proposed by D. Terzopoulos and D. Metaxas, Dynamic 3D models with local and global deformations: Deformable Superquadrics, IEEE Transactions on PAMI, 13(7):703-714, 1991, is a physical feature-based approach. It fits complex 3D shapes with a class of dynamic models that can deform both globally and locally. The model incorporates the global shape parameters of a conventional superellipsoid with the local degrees of freedom of a spline. The local/global representation simultaneously satisfies the requirements of 3D shape reconstruction and 3D recognition. In animation, the behaviors of the deformable superquadrics are governed by motion equations based on physics. In 3D model construction, the model is fitted with 3D visual information by transforming the data into forces and simulating the motion equations through time.
In animation tasks, it is easy to detect, attach and apply geometrical, kinematic and physical parameters to continuously modeled objects. However, it is difficult for behavioral features since the model lacks a symbolic structure as the base to fit in behavioral languages. Furthermore, to form any real world objects, approximation by those pre-defined primitives such as generalized cylinders or superquadrics is impossible.
Discrete Modeling
A wide variety of computer vision applications involve highly irregular, unstructured and dynamic scenes. They are characterized by rapid and non-uniform variations in spatially irregular feature density and physical properties. It is difficult to model such objects from any of the four aspects mentioned before with continuous elements. Such difficulty arises from the unpredictable behaviors of the objects. Discrete modeling is able to approximate the surfaces or volumes of this kind of objects by vast patches of very simple primitives, such as polygons or tetrahedrons.
Since most graphics applications use polygons as the fundamental building block for object description, a polygonal mesh representation of curved surfaces is a natural choice for surface modeling as disclosed in G. Turk, Re-Tiling Polygonal Surfaces, Computer Graphics (SIGGRAPH '92), 26(2):55-64, 1992. Polygonal approximation of sensory data is relatively simple and sampled surfaces can be approximated to the desired precision, see M. A. Khan and J. M. Vance, A Mesh Reduction Approach to Parametric Surface Polygonization, 1995 ASME Design Automation Conference Proceedings, Boston, Mass., September 1995. Physical and kinematic features can be associated more flexibly with either a single element (a polygon) or a group of elements (a patch of polygons).
Triangular mesh is a special case of polygonal mesh. It has been recognized as a powerful tool for surface modeling due to its simplicity and flexibility and abundance of manipulation algorithms. They are used in many general vision and graphics applications, which provide fast preprocessing, data abstraction and mesh refinement techniques, see see for example R. E. Fayek, 3D Surface Modeling Using Topographic Hierarchical Triangular Meshes, Ph.D Thesis, Systems Design Eng., University of Waterloo, Waterloo, Ontario, Canada, April 1996, L. De Floriani and E. Puppo, Constrained Delaunay Triangulation for Multi-resolution Surface Description, Pattern Recognition, pp. 566-569, 1988, and S. Rippa, Adaptive Approximation by Piecewise Linear Polynomials on Triangulations of Subsets of Scattered Data, SIAM Journal on Scientific and Statistical Computing, pp. 1123-1141, 1992.
Based on the triangular mesh, Terzopoulos and Waters had successfully attached physical constraints on the human facial model as disclosed in D. Terzopoulos and K. Waters, Analysis and Synthesis of Facial Image Sequences Using Physical and Anatomical Models, IEEE Transactions on PAMI, 15(6):569-579, 1993. From sequences of facial images, they built mesh models with anatomical constraints. An impressive advance of the methodology is that it has the capability to model the behavioral features. With the support of anatomical data, different emotions can be modeled and applied to arbitrary human faces.
The main drawback of discrete methodologies is that it lacks the high level structure to control the modeling or to perform symbolic operations. Furthermore, the discrete primitives are unstructured and only contain information about local features. When applied to dynamical cases, even though the geometrical modeling can be highly precise, abstracting high level information from the data is still problematic.
Graph-Based Symbolic Modeling
In graph-based approaches, a complex object is usually represented explicitly by a set of primitives and the relations among them in the form of a graph. If the primitives are regular blocks such as cylinders, cubes or superquadrics, such model can be viewed as a clone of continuous modeling. If the model consists of vast number of primitives that are discrete polygons or polygonal meshes, it is an extension of discrete model.
The graph representation was first put forward in 1970 neither for computer vision nor for graphics applications, but for the script description of a scene in artificial intelligence, see A. C. Shaw, Parsing of Graph-Representation Pictures, Journal of ACM, 17:453-481, 1970. In the early 1980's, Shapiro (L. G. Shapiro, Matching Three-dimensional Objects using a Relational Paradigm, Pattern Recognition, 17(4):385405, 1984) applied relational graphs for object representation where the quadtree (for surface) and octree (for solid) encoding algorithm given by Meagher (D. J. Meagher, Octree encoding: a new technique for the representation, manipulation, and display of arbitrary three-dimensional objects by computer, Technical Report, IPL-TR-80, 111, Image Processing Laboratory, Rensselaer Polytechnic Institute, Troy, N.Y., Apr. 1982). It can be viewed as a special case since they form trees (directed graphs) as the hierarchical structure for representation. In the same period, Wong et al. introduced the attributed hypergraph representation for 3D objects see A. K. C. Wong and S. W. Lu, Representation of 3D Objects by Attributed Hypergraph for Computer Vision, Proceedings of IEEE S.M.C International Conference, pp.49-53, 1983. Later, random graph (A. K. C. Wong and M. You, Entropy and Distance of Random Graphs with Applications to Structural Pattern Recognition, IEEE Transactions on PAMI, 7(5):599-609, 1985) and more sophisticated attributed hypergraphs are presented as geometry and knowledge representation for general computer vision tasks (A. K. C. Wong and R. Salay, An Algorithm for Constellation Matching, Proceedings on of 8th International Conf. on Pattern Recognition, pp.546-554, Oct. 1986 and A. K. C. Wong, M. Rioux and S. W. Lu, Recognition and Shape Synthesis of 3-D Objects Based on Attributed Hypergraphs, IEEE Transactions on PAMI, 11(3):279-290, 1989.
In 3D model synthesis, random graphs were applied to describe the uncertainties brought by sensors and image processing, see A. K. C. Wong and B. A, McArthur, Random Graph Representation for 3-D Object Models, SPIE Milestone Series, MS72:229-238, edited by H. Nasr, in Model-Based Vision, 1991. In Wong et al., an attributed hypergraph model was constructed based on model features (A. K. C. Wong and W. Liu, Hypergraph Representation for 3D Object Model Synthesis and Scene Interpretation, Proceedings on 2nd Workshop on Sensor Fusion and Environment Modeling (ICAR), Oxford, U.K., 1991). The representation had a four-level hierarchy that characterizes: a) the geometrical model features; b) the characteristic views induced by local image/model features, each of which contains a subset of the model features visible from a common viewpoint; c) a set of topological equivalent classes of the characteristic views; and d) a set of local image features wherein domain knowledge could be incorporated into the representation for various forms of decision making and reasoning.
Since graph-based modeling approaches introduce the concepts of primitives and their relations, it is straightforward for constructing a hierarchical representation. At lower levels, geometrical, kinematic and physical features can be encoded as the primitives and their associated attributes. At higher levels, entities such as edges and hyperedges can be used to represent symbolic information. In a graph structure, it is handy to perform symbolic operations. With the aid of machine intelligence, domain knowledge can be learned and later recalled and processed together with other data. However, traditional graph-based methods lack the representational power for dynamic objects. Although they have the potential to handle deformations/transformations, yet up to now, they are applied only to rigid object modeling, due to their rigid structure and the lack of transformation operators on graphs.
The fast paced developments in computer vision, computer graphics and Internet motivate the need for a new modeling methodology using a unified data structure and manipulation processes directed by machine intelligence.