1. Field of the Invention
This invention relates to neuromimetic homomorphic pattern recognition methods and apparatuses and particularly to neuromimetic homomorphic pattern recognition method and apparatuses that utilize time delay signal filtering and additive encoding and decoding.
2. Description of the Prior Art
Pattern recognition is defined as the automated identification of shapes. Homomorphic describes pattern recognition between sets of similar form but of different structure. “Neuromimetic” is defined as a device that functions in a biologically plausible way like neurons, but can be implemented electrically i.e., mimicking that structure.
Many systems today simulate, or attempt to simulate neural processors. For example, speech recognition is currently done using digital signal processing, Fourier transforms, and Hidden Markov Model techniques, etc.
Neurons are comprised of three parts: dendrites, the soma and the axon. The input structure is known as the dendrites. The central processing unit of the cell is the soma. Individual neurons are essentially feed forward devices. Currents from synapses (inputs) at the distal ends of the dendrites are integrated and collected near the soma. When a certain threshold is met, the soma generates a spike and resets its input level. The output of the soma is on the axon. The axon itself is similar to a dendrite in that it has a reverse type tree structure. It is commonly accepted that the axon's sole purpose is to carry the output signal to other neurons. It connects to them through synapses. The most commonly accepted biological model for a neuron is called the “Integrate and Fire” neuron.
As described by neuroscientists, when the nerve impulse is transferred across a synapse into a dendrite it becomes a post-synaptic potential, either excitatory or inhibitory. Individual neurons can typically have 4000-10000 input connections. These currents are integrated in the dendrite. When they reach a predetermined threshold, the soma produces a spike (pulse) onto the axon and resets the voltage on the dendrite to initial conditions. There is also something called the refractory period, which is the short time after a spike is generated in which a new spike cannot be generated (typically 1-2 ms). The time frame for generation of a spike is typically less than 20 ms (assuming that the appropriate number of input spikes is received to reach the threshold. Typically, about one percent of the input connections would need to receive a spike within the appropriate processing time frame in order for an output spike to be generated at all.
Models of neuron dynamics have been used for more than 100 years. Numerous examples of electrical circuits have been designed to replicate the “threshold and fire” action of a neuron.
A typical prior art speech recognition system uses digital signal processing (DSP) to compute the Fourier transform (FT) of a broadband input signal such as a speech utterance. It uses DSP and FT to compute the cepstral coefficients of the frequency spectrum. These features are then typically input into a neural network (Hidden Markov Model) to identify the spoken phonemes in the input speech signal.
A cepstrum (pronounced /‘kE;pstrom/) is the result of taking the Fourier transform (FT) of the decibel spectrum as if it were a signal. Its name was derived by reversing the first four letters of “spectrum”. The cepstrum can be seen as information about rate of change in the different spectrum bands. It was originally invented for characterizing the seismic echoes resulting from earthquakes and bomb explosions. It has also been used to analyze radar signal returns. It is now used as the primary feature vector for decoding the human voice and musical signals. For these applications, the spectrum is usually first transformed using the Mel frequency bands. The result is called the Mel frequency cepstral coefficients, or MFCCs. In the Mel frequency spectrum, the frequency bands are positioned logarithmically as to more closely approximate the human auditory system. It is used for voice identification, pitch detection and much more. Recently it is also getting attention from music information retrieval researchers. The cepstrum separates the energy resulting from vocal cord vibration from the “distorted” signal formed by the rest of the vocal tract. The cepstrum is also related to homomorphic sound theory.
As a simple example of how speech sounds are recognized, FIG. 1 illustrates the frequency and amplitude (spectrum) of the sound “ah” articulated at a base frequency (pitch) of 100 Hz by a male. Notice the only frequencies present in the spectrum are at the fundamental pitch (100 Hz) and at harmonics which are even multiples of the pitch. Note, FIGS. 1, 2, and 3 are reproduced from the book Fundamentals of Musical Acoustics, by Arthur Benade, New York, Oxford University Press, 1976. FIGS. 1 and 2 appear on page 371. FIG. 3 appears on page 373.
In FIG. 2, the same sound is being produced by a female at the base pitch of 220 Hz. The only frequency components present are again harmonics of the base. The spectral pattern for the sound “Ah” is the same whether spoken by the male speaker at 100 Hz or spoken by the female speaker at 220 Hz. The sound “Ah” is characterized by a similarity of form in each case but of different structure (Homomorphic). If we heard both of these sounds being produced, we would agree that the same “Ah” vowel is being produced, even though the second speaker has a pitch twice as high and with fewer (and different) overall frequency harmonics formed.
FIG. 3 is a graph of the spectral pattern of the vowel sound “Ah”. Note that the spectral pattern for the sound “Ah” is the same whether spoken by the male speaker at 100 Hz or spoken by the female speaker at 220 Hz.