Large 3D engineering models like architectural designs, chemical plants and mechanical computer-aided design (CAD) designs are increasingly being deployed in various virtual world applications, such as Second Life™ and Google Earth™. In most engineering models there are a large number of small to medium sized connected components, each having up to a few hundred polygons on average. Moreover, these types of models have a number of geometric features that are repeated in various positions, scales and orientations. Computer and video games use 3D models as does the motion picture (movie) industry. The movie industry uses 3D models as characters and objects in animated and real life motion pictures. 3D models are also used in medicine and architecture.
Various algorithms have been proposed to compress 3D meshes efficiently since the early 1990s. Early work, however, mostly concentrated on compressing single connected 3D models with smooth surfaces and small triangles. For multi-connected 3D models, such as large 3D engineering models, the components are compressed separately. This causes a relatively ineffective compression. In fact, the compression performance can be greatly increased by removing the redundancy between different connected components. Compression, in the motion picture industry, of 3D models is extremely important in the transmission of 3D motion pictures over broadband to consumers and transmission to theaters. 3D mesh models (e.g., movie, motion pictures) consume very large amount of bandwidth.
A method for automatically discovering repeating geometric features in large 3D engineering models was proposed in D. Shikare, S. Bhakar and S. P. Mudur, “Compression of Large 3D Engineering models Using Automatic Discovery of repeating geometric Features”, 6th International Fall Workshop on Vision, Modeling and Visualization (VMV2001), Nov. 21-23, 2001, Stuttgart, Germany (hereinafter “Shikare”). However, much room was left for more efficient compression of 3D engineering models. For example, no compression solution was provided that covered transformation information of repeated instances, which is necessary for restoring the original model. Considering the large size of connected components that a 3D engineering model usually has, this kind of information also consumes a large amount of storage. Further, if PCA (Principal Component Analysis) of positions of vertices of a component is used, components with the same geometry and different connectivity will have the same mean and same orientation axes. Thus, the state of the art is not suitable for detecting repeating patterns in various scales. Two components that differ only in scale (i.e. size) are not recognized as repeating features of the same equivalence class. Further, it is desirable to achieve a higher compression ratio than described in Shikare.
O. Devillers, P. Gandoin, “Geometric Compression for Interactive transmission”, in IEEE Visualization, 2000, pp. 319-326 (hereinafter “Devillers”) describes a KD-tree based compression algorithm to encode the means of all connected components of a mesh model. At each iteration, this algorithm subdivides a cell into two child cells, and encodes the number of vertices in one of the two child cells. If the parent cell contains p vertices, the number of vertices in one of the child cells can be encoded using log2(p+1) bits with an arithmetic coder. This subdivision is recursively applied, until each non-empty cell is small enough to contain only one vertex and enables a sufficiently precise reconstruction of the vertex position. It is mentioned in Devillers that the algorithm is most efficient for non-uniform distributions, with regular distribution being the worst case.
A sequence of symbols, wherein the symbols are chosen from an alphabet or a symbol set, can be compressed by entropy coding. An entropy coding engine assigns codewords for symbols based on the statistical model, i.e., the probability distributions of symbols. In general, more frequently used symbols are entropy coded with fewer bits and less frequently occurring symbols are entropy coded with more bits.
Entropy coding has been studied for decades. Basically, there are three types of entropy coding methods: variable length coding (VLC), like Huffman coding, arithmetic coding, and dictionary-based compression, like Lempel-Ziv (LZ) compression or Lempel-Ziv-Welch (LZW) compression.
The VLC codes use an integral number of bits to represent each symbol. Huffman coding is the most widely used VLC method. It assigns fewer bits to a symbol with greater probability, while assigning more bits to a symbol with a smaller probability. Huffman coding is optimal when the probability of each symbol is an integer power of ½. Arithmetic coding can allocate a fractional number of bits to each symbol so that it can approach the entropy more closely. Huffman coding and arithmetic coding have been widely used in existing image (video) compression standards, e.g., JPEG, MPEG-2, H.264/AVC. The LZ or LZW utilize a table based compression model where table entries are substituted for repeated strings of data. For most LZ methods, the table is generated dynamically from earlier input data. The dictionary based algorithm has been employed in, for example, GIF, Zip, PNG standards.
Spatial tree based approaches can be used to compress geometry data, such as random point positions and vertex positions of watertight 3D models. A watertight 3D model is a model in which the vertices are evenly and densely distributed. Spatial tree based approaches organize input spatial points by an octree or a KD-tree. The tree is traversed and the information required for tree restoration is stored.
Initially, a bounding box is constructed around all points of a 3D model. The bounding box of all 3D points is regarded as a single cell in the beginning. To build the spatial tree, a cell is recursively subdivided until each non-empty cell is small enough to contain only one vertex and enable a sufficiently precise reconstruction of the vertex position. As vertex positions can be restored from central coordinates of corresponding cells, the spatial tree based algorithms may achieve multi-resolution compression with the same compression ratio as single-resolution compression algorithms.
FIG. 1 shows the principle of KD-tree coding in a 2D case. The 2D model is enclosed by a bounding box 10, which is called parent cell. Seven vertices are positioned within the parent cell. The KD-tree encoding algorithm starts with encoding the total number of vertices using a predefined number of bits, and then subdivides the cells recursively. Each time it subdivides a parent cell into two child cells, it encodes the number of vertices in one of the two child cells. By convention, this may be the left child cell (after vertical splitting) or the upper cell (after horizontal splitting). If the parent cell contains p vertices, the number of vertices in one of the child cells can be encoded using log2(p+1) bits with an arithmetic coder. This subdivision is recursively applied, until each non-empty cell is small enough to contain only one vertex and enable a sufficiently precise reconstruction of the vertex position. For compressing the positions of all repeated instances, the entire bounding-box 10 of all the positions is regarded as a parent cell in the beginning. In the example of FIG. 1, the total number of vertices (seven) is encoded using 32 bits. Then vertical splitting is applied, so that a left child cell V1 and a right child cell V2 are obtained. In the next coding step, the number of vertices in the left child cell V1, which is four, is encoded. The number of bits used for the encoding is determined by the number of vertices within the parent cell: in this example, it is log2(7+1)=3 bits. From the number of vertices in the parent cell and the number of vertices in the left child cell V1, the number of vertices in the right child cell V2 can be deduced, and therefore needs not be encoded.
In the next step, horizontal splitting is applied. The left child cell V1, which is now a parent cell V1, is split into an upper child cell V1H1 and a lower child cell V1H2. The right child cell V2, which is now a parent cell V2, is split into an upper child cell V2H1 and a lower child cell V2H2. The encoding continues with the upper left child cell V1H1, which has two vertices. Thus, the number 2 is encoded next, wherein log2(4+1)=2.3 bits are used in an arithmetic coder. As described above, the number of vertices in the lower left child cell V1H2 needs not be encoded, since it can be deduced from the number of vertices in the left cell V1 and in the upper left child cell V1H1. Then, the same procedure is applied to the right cell V2, which results in encoding a zero using two bits. As shown in FIG. 1, two more splitting steps are necessary until each vertex is in a separate cell, and even more steps are necessary until each vertex is sufficiently localized within its cell. Each step requires the encoding of a growing number of ones or zeros. Depending on the required accuracy, the number of additional steps may be high.
On the other hand, an octree based approach subdivides, in each iteration, a non-empty cell into eight child cells. For ease of illustration, 2D examples describing a quadtree are shown in FIGS. 2 and 3. The traversal orders are denoted by arrows. For encoding, a current parent cell is split into four child cells that are traversed in a pre-defined order, and a single bit per child cell indicates whether or not there is a point within the child cell. For example, in FIG. 2, the child cells of two parent cells 1 and 2 are traversed as shown in arrows, with non-empty child cells being colored gray. Child cells 210, 211, 212, and 213 of the first parent cell 1 are represented by a first sequence ‘1010’. Since the first and third child cells 210, 212 of the traversal are non-empty (i.e., contain one or more points), they are indicated by ‘1’s. The second and fourth child cells 211, 213 are empty (i.e. contain no points), they are indicated by ‘0’s. FIG. 3 shows the same cells using different traversals and resulting sequences.
FIG. 4 shows parent and child cells of an octree scheme. In the octree scheme, a parent cell is split into eight child cells 40, . . . , 46 (one hidden child cell behind lower left cell 42 is not shown). A possible traversal order could be left-right, up-down and front-back, resulting in a traversal sequence of cells 40-41-42-43-44-45 (hidden cell behind lower left cell 42)-46. Correspondingly, in the octree case the non-empty child cell configuration is denoted by 8-bit binary numbers, covering all the 255 possible combinations of empty and non-empty child cells. Separate encoding of the number of non-empty child cells is not required. TABLE 1 is an example of a sequence.
TABLE 1An exemplary sequence.111111110110011000111011110011000001000000000010000000101000000000000001
Note that the specific traversal order of child cells within a parent cell is not very relevant for the present embodiments. In principle, any traversal order can be used for the present embodiments. In the following, the string of bits used to represent a child cell configuration is denoted as a symbol. In the example of TABLE 1, 8 bits are used for each symbol. In other implementations, the number of bits in a symbol may vary. For example, a 4-bit string is used to represent the child cell configuration for a quadtree, and thus, the number of bits for a symbol in the example of FIG. 2 is 4.
FIG. 5 shows an example of an octree structure. Each node is associated with a symbol and each layer corresponds to a certain precision of the tree representation. The initial cell is divided into eight cells. Child cells 1, 2, 5, 6, and 7 contain more vertices and child cells 3, 4, and 8 are empty, resulting an 8-bit symbol 11001110 (510) to represent the child cell configuration at layer 0. Each non-empty child cells are further divided and the corresponding child cell configurations are represented in layer 1. The subdivision may continue until each non-empty cell only contains one vertex.
TABLE 2An exemplary probability distribution.SymbolpSymbolpSymbolpSymbolpSymbolp000001000.1280000001010.0034101000000.00200000101010−30100010010−3000000100.1275000010010.0030000000110.00150000101110−30110001010−3000010000.1167011000000.0025000100010.00150000111110−30110100010−3100000000.1162100000100.0025000100100.00150001100010−31011101110−3010000000.1128100010000.0025001010000.00150001110010−31100110010−3000100000.1118000001100.0020001100000.00150010011010−31101000010−3000000010.1108000011000.0020010100000.00150011101110−31111111110−3001000000.1098001000100.0020110000000.00150100001010−3000001115 · 10−4
Using a breadth-first traversal of the octree, the vertex positions of a 3D mesh can be organized into a sequence of symbols. For the example in FIG. 5, the sequence of symbols becomes: 11001110, 11000000, 10010100, 00100110, 00001000, and 00001000.
The probability distribution of the most frequently occurring symbols in a complex 3D model is shown in TABLE 2, in a descending order of the probability. As can be seen from TABLE 2, the symbols having only one ‘1’ in the binary representation occur with a dominant probability (>93%). The geometric explanation may be that the vertices seldom share a cell after several subdivisions. That is, the bottom layers of the octree are dominated by symbols with only one ‘1’, and other symbols occur more often at the top layers.
According to the present embodiments, two symbol sets are defined: a universal symbol set, S0={1, 2, 3, . . . , 255}, including all possible symbols, and a symbol set, S1={1, 2, 4, 8, 16, 32, 64, 128}, including only symbols having one ‘1’, i.e., the most frequently occurring symbols. Note for ease of representation, 8-bits binary strings are written as decimal numbers. A symbol is called an S1 symbol if it belongs to symbol set S1, and is called a non-S1 symbol otherwise.
To benefit from the statistical property of an octree, PCT application No. PCT/CN2011/077279, entitled “A Model-Adaptive Entropy Coding Method for Octree Compression,” proposes partitioning the sequence represented by an octree into several sub-sequences which are coded with S0 or S1 adaptively. The indices of sub-sequence boundaries are coded as supplemental information. Because of the overhead of the supplemental information (e.g., two bytes for each index), generally large sub-sequences of consecutive S1 symbols are coded with symbol set S1.
When S1 symbols and non-S1 symbols both occur in a portion of the sequence, with S1 symbols having much higher probabilities, it is not efficient to divide such a portion into several sub-sequences because of the overhead. On the other hand, it is also not efficient to code such a portion with symbol set S0 as non-S1 symbols occur with low probabilities.
In 3D mesh coding, the geometry data is usually compressed by spatial tree decomposition based approaches, e.g. KD-tree based approach described in Devillers or octree based approach described in J. L. Peng, C. C. Jay Kuo, “Geometry Guided Progressive Lossless 3D Mesh Coding with Octree Decomposition”, ACM SIGGRAPH (ACM Transactions on Graphics 24 (3)), pp 609-616, 2005 (hereinafter “Peng”) and Y. Huang, J. Peng, C. C. J. Kuo, and M. Gopi, “A Generic Scheme for Progressive Point Cloud Coding”, IEEE Transactions on Visualization and Computer Graphics 14, pp 440-453, 2008 (hereinafter “Huang”). Besides supporting progressive coding, the methods of Devillers, Peng and Huang also achieve a considerable compression gain. These coders recursively subdivide the smallest axis-aligned bounding box of given 3D model into two or eight children in a KD-tree or octree, respectively, data structures. A cell is recursively subdivided until each nonempty cell is small enough to contain only one vertex and enable a sufficiently precise reconstruction of the vertex position. For each cell subdivision, whether or not each child cell is empty is signified by some symbols. A symbol sequence describing the KD-tree or octree, which are called traversal symbol sequences herein are generated by breadth first traversing the octree and collecting the symbols representing the subdivision of the nodes encountered. Then an entropy coder-decoder (codec) is utilized to compress that symbol sequence. To reduce the entropy of the symbol sequence and then improve the coding efficiency, both Peng and Huang perform child-cell reordering based on some neighborhood-based predictor.
For each cell subdivision, Peng encodes the number, T (1<=T<=8), of non-empty-child cells and the index of its non-empty-child cell configuration among all possible combinations. The geometry information is taken into consideration during the non-empty-child cell representation, resulting in better compression but greater complexity.
PCT/CN2011/077279 and PCT/CN2011/078936 propose discarding the number of non-empty-child cells T. In such cases, the non-empty-child-cell configuration is denoted by 8-bit binary numbers, covering all 255 combinations. These 8-bit binary numbers are compressed by entropy coding.
The statistic based approaches proposed in PCT/CN2011/077279 and PCT/CN2011/078936 lead to much lower computational complexity and better robustness in randomly-distributed position coding than Devillers and Peng. The reverse is the case for the vertex compression of watertight 3D models. The reason is that the PCT/CN2011/077279 and PCT/CN2011/078936 do not remove the geometry redundancy, which costs considerable in terms of bits.