1. Field of the Invention
The present invention provides a method for reducing, with a minimal loss of relevant information, the quantities of data in a data set from which a pattern of data must be recognized.
2. Description of the Prior Art
A data set can contain an array of elements such as pixels or picture elements in an image, each element of which can adopt a number of values, here called information values or codes. The importance of recognizing patterns in data sets is great. If the data set contains for instance pixels of a typed or handwritten text, the separate letters of this text can be recognized by pattern recognition. Even when noise is present in the image for recognition it is often still possible to recognize the original patterns. If the data set is for instance a medical photograph, cell abnormalities or cell tumours can be recognized at an early stage by pattern recognition.
In the prior art diverse methods are known for recognizing patterns in data sets. There are statistical methods which however cannot process very well the structural information in the links in complex patterns. There are for instance also descriptive methods, wherein the attempt is made to define the properties of the patterns for recognition. These methods result in problems when the patterns for recognition are complex. Use can also be made of neural networks to recognize patterns. However, the use of neural networks to recognize patterns in large data sets comes up against limitations in the capacity of the present computers with which the neural networks are computed.
The method according to the present invention extracts relevant information from the data set, based on the internal information content which is estimated from the statistical properties which are present in a training set of already known (a priori) patterns provided to the device of the invention during a training phase. Non-relevant or superfluous information is ignored according to the present method. The size of the data set is hereby reduced, wherein a minimal loss of relevant information occurs.
The present invention relates to a method and device for reducing the quantity of digital information in a data set for the purpose of pattern recognition. The method comprises the following steps of:
determining during a training phase digital a priori information values associated with at least one known pattern. These a priori information values form a training set which is used in a later step in the recognition of patterns.
determining digital information values of first elements associated with a pattern for recognition. The first elements can for instance be pixels of an image, in which image a pattern must be recognized. Digital information values are then for instance the grey tone values or colour values of the pixels.
grouping two or more first elements;
pairing the grouped first elements into second elements, wherein the number of digital information values for each second element is at least doubled;
for each second element, on the basis of pattern information from the training set formed in the training phase, merging a minimum of two digital information values into a reduced second element with a reduced number of information values. The pattern information is calculated on the basis of a statistical estimate of the probability that the digital information value is associated with a particular pattern. This statistical estimate is calculated from the data of the training set.
The merging of information values referred to in the final step takes place in a manner such that as little pattern information as possible is lost. On the basis of the a priori known possible patterns, the best estimate can be calculated of the probability that a combination of a determined information value and a determined pattern occurs. On the basis of the calculated estimate of the probability of all possible combinations of information values and patterns, a decision criterion is formulated with which the combination of pattern and information value can be determined which yields a minimal loss of pattern information when merged.