An effective measure of the similarity between multi-modal medical images is important in many clinical applications, such as for multi-modal image registration. Universal similarity metrics are used to estimate the similarity between different unimodal image data sets based on the statistics of the image intensity distribution, such as using local cross-correlation (LCC), mutual information (MI), entropy correlation coefficient (ECC), cumulative residual entropy correlation coefficient (CRECC) or the Kullback-Leibler (KL) divergence between the observed and a prior learned joint image intensity distribution. Universal similarity metrics have been successfully used for unimodal image analysis where the different unimodal image data is similar in terms of both intensity and texture. However, universal similarity metrics are insufficient to describe the complex relationship between different imaging modalities that have very different underlying imaging physics.
To overcome this insufficiency, supervised similarity metric learning was developed. In contrast to the universal similarity metrics discussed above, supervised learning optimizes a similarity metric, usually in a parametric form, using a set of training data. The similarity metric is trained for a specific application. One approach uses a support vector machine (SVM) based method and joint kernel maps for modeling nonlinear dependencies between image patches from different modalities. Another approach uses similarity sensitive hashing for embedding image data of different modalities into a common metric space. The common metric space is then used to parameterize a multimodal similarity metric.
Data representation is important to machine learning algorithms because different data representations signify very different factors that explain the variation in the image data. Hand-engineered image features, however, are not guaranteed to work well for all image data. Therefore, learning based methods have been developed to learn (shared) feature representation for unimodal data, for data from different imaging modalities, and different data sources (e.g., image and audio).