Recognition of patterns and properly assembling them for storage, preferably in a compact way, is continuously being attempted. However, unless otherwise specified, it cannot be assumed that all patterns are evenly distributed along the data. Because some patterns can be more prominent than others, they are likely to have a larger number of occurrences, while other patterns may be very rare. In addition, some patterns may be correlated to each other, and together form pattern-combinations which may also be very popular. This poses a problem to applications for pattern recognition systems. For example, to retrieve a similarity measurement between two content-segments, it is not enough to consider the number of corresponding patterns, but the probability of occurrence of each pattern should be considered as well. In addition, correlation between patterns should also be considered. For example, if two patterns always appear together, in essence they do not contain more information than a single pattern.
Such an effect, in turn, is detrimental for the scalability and the accuracy of a pattern-recognition system. That is, if the handling of different patterns is spread between multiple machines of the pattern-recognition system, then most machines dealing with “less-popular” patterns will remain inactive, whereas a few machines, processing “popular” patterns, will be overburdened with accesses. It is also impossible to distribute the handling of patterns according to their a-priory probability because of correlations between patterns, of which no assumptions can be made. Furthermore, in general, to scale up a pattern-recognition system it would be preferable to avoid duplication of the pattern-space and the need to hold a copy of the patterns in each machine.
Reduction of multiple symbols, such as a pattern, to a smaller number of manageable symbols that are easily recognizable is performed manually in certain cases. Consider, for example, a sequence of notes that are combined into a chord. A chord is a combination of two or more notes that are played, or otherwise heard as if being played simultaneously. However, the chords are repetitive in nature and hence, in order to reduce the number of notes provided to a performer, the sequence of notes is reduced to a symbol of a chord, which represents the plurality of notes. Hence, the chord marked as C7 means that the performer is to play the root note A, the minor third C, and a perfect fifth E, so that they appear to be played simultaneously. A person can easily translate the symbol of a chord into the specific notes it represents. Similarly, the creation of the mapping between two sets of symbols is performed manually based on specific rules to which rules may be added, deleted or modified as necessary.
It would be advantageous to provide an efficient solution for pattern recognition that overcomes the deficiencies of the prior art, particularly the requirement for human manual intervention in the recognition process.