It is widely accepted that prior knowledge about a target shape is important and should be used in shape detection. How to effectively use the prior knowledge has long been an active research topic in non-rigid shape detection. The Active Contour Model (ACM) and other energy minimization approaches have become a standard tool for non-rigid shape detection, where the prior knowledge is encoded into an energy function. An active contour is driven by the external and internal forces. The external force is derived from input images, while the internal force incorporates the prior knowledge of the target shape. In a standard setting, ACMs use two parameters to adjust the elasticity and stiffness of the shape. With such a limited flexibility, very little prior knowledge can be exploited by ACMs and the contour often converges to an unrealistic shape.
To mitigate this problem, the Active Shape Model (ASM) models the deformation of a shape differently. Given a set of shapes, the Principal Component Analysis (PCA) is applied to the shape space. The deformation of the shape is constrained to a subspace spanned by the eigenvectors associated with the largest eigenvalues. The searching space can be further restricted to a hypercube. By adjusting the number of principal components preserved, ASM can achieve a trade-off between the representation capability of the model and the constraints on the shape. If all principal components are used, ASM can represent any shape but no prior knowledge of the shape is used. On the other hand, if too few principal components are retained, an input shape cannot be well represented by the subspace. Therefore, there is an upper-bound of the detection accuracy given a specified choice of parameters. Both ACM and ASM only use the image contents around the shape boundaries, so they are more suitable for shapes with strong edges. The Active Appearance Model (AAM) is a natural extension of ASM, where the variation of the appearance is constrained to a subspace too.
Shape detection can also be formulated as a classification problem: whether a given image block contains the target shape. Exhaustive searching in the similarity transformation space is often used to estimate the translation, rotation, and scale of the shape in an input image. For example, the AdaBoost algorithm can be used for face detection. Given a large pool of simple features, AdaBoost can select a small feature set and the corresponding optimal weights for classification. The convolutional neural network (CNN) is another classification based approach combining feature extraction, selection and classifier training into the same framework. As a specially designed neural network, CNN is especially effective for two dimensional images. One drawback of these classification based approaches is that only the similarity deformation of the shape can be estimated.
Since it is hard to handcraft the prior knowledge in a shape detection framework, a method that directly exploits the expert annotation of the target shape in a large database is preferred. One known approach directly learns a regression function for the positions of the control points. Though simple and elegant, the regression output is a multi-dimensional vector (often in the order of 100 for shape detection, depending on the application). Since regression for multi-dimensional output is hard, PCA is often exploited to restrict the shape deformation space. So, it suffers from the same limitations as ASM and AAM. Another known approach uses a shape inference method to search for the most similar shape in the database. Particularly, the training set is clustered in the shape space into several clusters. A set of image features are selected to maximize the Fisher separation criterion. During shape detection, the input and training images are compared in the feature space to select a similar example shape for the input. As a heuristic metric, the Fisher separation criterion is optimal for very limited cases, such as the Gaussian distribution with the same covariance matrix. Both of the above approaches need a preprocessing step to estimate the rough position of a shape, which is often realized using a classification based approach.