A 2D triangular mesh refers to a tessellation of a 2D visual object plane into triangular patches. The vertices of the triangular patches are called "node points." The straight-line segments joining the node points are called "edges."
A dynamic 2D mesh consists of a temporal sequence of 2D triangular meshes, where each mesh has the same topology (i.e., structure), but node positions may differ from one mesh to the next. Thus, a dynamic 2D mesh may be defined by the geometry of the initial 2D mesh and motion vectors at the node points for subsequent meshes, where each motion vector points from a node point of the previous mesh in the sequence to a node point of the current mesh. The dynamic 2D mesh may be used to create 2D animations by mapping texture from a still image onto successive 2D meshes via well-known texture mapping methods. For example, the dynamic mesh may be used to render a waving flag from a still image of a flag. The local deformations of the texture in time are captured by the motion of mesh nodes from one mesh to the next. Hence, different animations of the same texture may be achieved by different sets of node motion vectors.
Texture mapping utilizes the structure of the mesh, i.e., the way the nodes of the mesh are connected with each other, namely the configuration of the edges of the mesh. A mesh may have a specified implicit structure, such as uniform structure or Delaunay structure, as described in S. M. Omohundro, "The Delaunay triangulation and function learning," International Computer Science Institute Technical Report TR-90-001, University of California Berkeley, January 1990.
Efficient coding of an animation sequence may be achieved by separately coding the still image texture, and the associated 2D mesh, i.e., the geometry and node vectors. The associated 2D mesh is represented by the geometry of the first mesh and motion vectors of the nodes of this first and subsequent meshes. The 2D mesh is encoded by coding the geometry of the first mesh and motion vectors of the nodes of this first and subsequent meshes.
The mesh geometry compression technique described here is limited to 2D triangular meshes with implicit topology, specifically meshes with uniform and Delaunay topology. In these cases, the mesh topology is defined implicitly, given the locations of the mesh nodes (also called vertices) and some additional information to be specified in detail later. Algorithms to implement Delaunay triangulations are available in literature and are not described here. It should be noted that Delaunay triangulations are uniquely defined except if the nodes to be triangulated contain certain degeneracies in their locations. Here, it is assumed that both the mesh encoder and decoder use an agreed upon technique to handle such degeneracies. Such techniques are well known to those of skill in the art. The mesh geometry compression technique described here allows a high compression ratio for these constrained classes of meshes.
Representing mesh motion efficiently is important for describing mesh-based animations. Here, we describe a technique for compression of mesh motion in the 2D case, although it should be noted that the principle may be extended to the case of 3D meshes with 3D motion straightforwardly. Furthermore, it should be noted that the mesh motion compression technique described here is directly applicable to meshes with general topology, although the examples provided herein describe meshes with constrained topology. Finally, it should be noted that the principles of the invention with respect to motion coding may also be applied to the coding of surface appearance attributes straightforwardly.
The coding methods described here may for instance be employed in the context of MPEG-4. MPEG-4 is an object-based multimedia compression standard being developed by the Motion Picture Experts Group, which allows for encoding of different audio-visual objects (AVO) in the scene separately, as an extension of the previous MPEG-1/2 standards. These AVO are decoded and then composited at the user terminal according to a transmitted scene description script and/or user interaction to form display frames. The visual objects may have natural or synthetic content, including audio, video, 3D graphics models, scrolling text and graphics overlay, and so on.