In the medical field, a doctor observes the position, state, temporal change, and the like of a morbid portion by interpreting a medical image obtained by imaging a patient using a medical image collection apparatus. Apparatuses for generating a medical image include a plain X-ray imaging apparatus, X-ray computed tomography apparatus (X-ray CT), magnetic resonance imaging apparatus (MRI), and ultrasonic image diagnostic apparatus (US). These apparatuses have different characteristics, so the doctor generally selects a plurality of apparatuses suited to a region to be imaged: a disease, or the like. For example, the MRI image of a patient is captured, and an ultrasonic image is captured while referring to the MRI image, thereby obtaining information effective for diagnosis, including the position and spread of a morbid portion.
When various apparatuses capture medical images at various times, diagnosis using these images can be made more easy by making these images correspond to each other. As one method for making images correspond to each other, a technique for making multi-dimensional images correspond to each other is disclosed in Warren Cheung and Ghassan Hamarneh, “n-SIFT: n-Dimensional Scale Invariant Feature Transform”, IEEE Transaction on Image Processing, Vol. 18, No. 9, September 2009 (NPL 1). In this technique, an intensity gradient histogram in a local area of a multi-dimensional image is calculated as the feature value of an image in the local area. By comparing this feature value with a feature value similarly obtained for another image, images can be made to correspond to each other. Corresponding portions in medical images captured by various imaging apparatuses can be observed in correspondence with each other.