This invention relates to fractal harmonic overtone mapping of speech and musical sounds for high-resolution, dynamic control of input sensitivity, adaptive control of output acoustics and phonology, and for information storage and pattern recognition.
Current strategies for computer speech recognition and voice analysis are generally based on processes that transform information derived from the frequency spectrum of sound. The primary tools in spectral analysis of sound are the Fourier transform and many variants. A large variety of mathematical functions such as inverse spectral (“cepstral”) and wavelet analyses have also been applied to speech perception. Current strategies for speech processing reflect the theory that sound is perceived in the inner ear tonotopically, with location along the cochlea correlating with frequency.
A number of prior patents explain the current strategies for signal processing and their limitations. For example, U.S. Pat. No. 6,124,544 teaches that autocorrelation has proven unreliable. One reason that is mentioned is that the sample rate can introduce artifacts.
U.S. Pat. No. 6,701,291 supports advantageously adjusting, in a coordinated manner, a handful of parameters. U.S. Pat. No. 6,584,437 reviews coding methods that use a lattice to encode pitch periods and differences between pitch periods.
U.S. Pat. No. 6,658,383 explains how speech and musical signals are approached differently in the current art. A proposed solution is to encode signals with several modes, using different modes for musical signals and voiced speech signals. U.S. Pat. No. 6,658,383 does not, however, address unvoiced speech.
U.S. Pat. No. 6,725,190 discloses various approaches to coding speech including a proposal for phase-binned speech but requires separate accounting based on a “voicing decision.” U.S. Pat. No. 6,745,155 discusses input from a “basilar membrane model device”, with time delays or autocorrelation as a means for signal analysis.
U.S. Pat. No. 6,732,073 discloses a way of enhancing a frequency spectrum, using the history of sound signals a short interval before as well as information about sound signals a short interval afterward. The inclusion of information over time is a key aspect of many current approaches to signal analysis.
Cochlea, the Latin word for “chamber,” is pronounced either as “coke”-lee-uh or as in the phrase “the cockles of the heart” (from the Latin cochleae cordis, “chambers of the heart”). Like the heart, it has a spiral shape (a “cockleshell”), which acts somewhat like a prism to separate sound into its various component frequencies. Frequency information is processed in the inner ear, which consists of the cochlea, the cochlear nucleus, and a variety of brain centers. There are three problems with a psychoacoustic model that uses only tonotopic frequency information.
Critical bands, which limit our ability to hear frequencies that are too close together, indicate that there is a signal processing mechanism along the length of the cochlea that may provide contrast enhancement or automatic gain control. Experiments show that for typical tones, the fundamental and harmonic overtones 2 through 6 are perceived as distinct tones and higher harmonics are perceived as a fused “residue tone” or “residual tone.” Humans apparently can only be consciously aware of harmonic overtones that are far enough apart to fall into separate critical bands. Humans cannot hear harmonic overtones that are “too close together.” However, this does not preclude possible mechanisms that advantageously make use of information in higher harmonic overtones via unconscious processes. Signal processing via such “hidden Markov models” is a common theme in neural network modeling.
“Active hearing” refers to recent advances in our understanding of the mechanism of hearing including the function of the protein prestin and the presence of a spectrum of self-reinforcing vibrations in the inner ear. These reverberations are due to positive feedback loops across the width of the cochlea involving outer hair cells and their stereocilia. Stereocilia act as valves that control the flow of charged ions (like transistors, controlling the flow of more power than they absorb, according to C. D. Geisler, From Sound to Synapse, Oxford Univ Press, 1998). When movement of an outer hair cell's stereocilia change its voltage, the protein prestin causes the cell to elongate or contract. (D. Oliver et al., Science 292, 2340, 2001). This rocks the cochlear partition, which triggers the cell's stereocilia, causing the cycle to repeat. In effect, each segment of the cochlea is a regenerative receiver. This is the historical term used for radio receivers that used positive feedback. They invariably had a regeneration control to vary the amount of positive feedback (Philip Hoff, Consumer Electronics for Engineers, Cambridge Univ Press, 1998).
According to active hearing, when a sound is initially perceived there may be a gesture-like shift in the reverberations in the cochlea. Hearing a sound may force the cochlea to “tune in.” This type of process would be analogous to “adaptive optics” and would require dynamic feedback with a time scale estimated to be on the order of 0.5 ms. Thus, the function of the cochlea is more than a prism-like separation of sound into its component frequencies.
Multiple maps of auditory space have been suggested by experiments involving researchers wearing distorting earpieces that disrupt their ability to judge whether sounds are “up” or “down.” (P. M. Hofman, J. G. A. Van Riswick, A. J. Van Opstal, Nature Neuroscience, 1 (5)417,1998). Unlike experiments with distorting eyeglasses, which take time for readjustment afterwards, correct sound localization occurred immediately when the fake ears were removed. Thus, shifting between cortical representations is possible, raising the question of how frequency information distributed along the cochlea (a one-dimensional analog) could be sufficient to model the three-dimensional world. An additional problem is how the complexity of multiple maps would be managed.
Two innovationssolutions were developed by the author. The first is from the field of neural network signal processing and is the concept “harmonic fields.” The second is from the field of optimization theory and is an extension of the mathematical concept of an adaptive walk on a virtual landscape, “fractal mapping.” If the virtual landscape is a map of the neuromuscular patterns for sound in the throat and also the sensorineural patterns for sound in the ear, combined with the neural feedback for dynamic control of active hearing in the cochlea, optimization of the multiple interacting streams of data applying to different size scales but have similar recursive possibilities could occur. The result would be similarity and function across different size scales, leading the author to the concept “a fractal map of harmonic overtone space.”
The invention was developed in the course of research for the paper, “Fractal harmonic reconstruction of ancient South Asian musical scales,” by Robert Patel Quinn, M. D. The invention is introduced as a method for analyzing harmonic overtones, which are high pitch sounds that have frequencies which are an exact multiple of the fundamental frequency. Although a frequency can be described both as a harmonic and as an overtone, the terminology employed in the paper distinguishes harmonics from overtones by using numbers for harmonics and letters for overtones, and uses the convention that harmonic 1 is the fundamental frequency of a tone. Musical notes are drawn as a column (a musical staff) with higher pitch harmonic overtones at the top and the fundamental at the bottom.
In contrast to neural network signal processing models of the sense of touch and vision, which involve “receptive fields” that are spatially contiguous, the olfactory system processes smells by “molecular receptive range.” (K. Mori, Y. Yoshihara, Progress in Neurobiology, Vol 45, 585, 1995). An analogous process in the ear could correlate sounds an octave apart, leading to harmonic fields.
Harmonic fields can be visualized (FIG. 3) as a connection (a neuron) linking two points in the cochlea; for example, those that correspond to harmonics 9 and 3. Another example of a harmonic field is shown by the neuron linking harmonics 3 and 1. Each neuron would also function as a “sensor” for coinciding harmonics 6 and 2 of other tones with different fundamentals, reinforcing the linking relationship; the harmonic fields are detectors of the ratio rather than of specific numbers. Higher order connections between these neurons (“neural networking”) and signals flowing toward the brain as well as “active hearing” signals flowing toward the cochlea are important components of the fractal harmonic overtone mapping model. The hypothesized harmonic fields are scanned and the results are integrated into a multi-dimensional map. The illustration shows that sound first enters the inner ear at the high-frequency end of the cochlea. Depending on the speed of sound in the fluid of the cochlea and the speed and course of neural signals, this may be a reason that harmonics are scanned from high to low frequencies, although the spiral design of the cochlea tends to ensure that harmonics are perceived roughly simultaneously.
A more fundamental reason why high frequency harmonics would be expected to be perceived first is the fact that the higher sampling rates possible at high frequencies would allow the wavelength of sound to be identified faster.
“Inharmonic fields” would not be expected to develop. Unevenly spaced “inharmonic fields” would not be expected to develop naturally in the nervous system since reinforcement would not occur from inputs with a variety of fundamental frequencies if their harmonics were not appropriately spaced.
If designed according to a genetic algorithm approach, efficiency suggests that some harmonic fields are redundant. An evolutionary approach would tend to produce enough complexity to exploit information but not too much for processing. The paper proposes the assumption that “harmonic fields develop only for tones that provide new information (the prime factors 2, 3, 5, 7, and 11).” This is because scanning through these prime number ratio harmonic fields (looking for simultaneous or near-simultaneous sounds) and then using other neurons to scan for simultaneous or near-simultaneous “higher order” correlations of neural network signals would result in information that can be recorded in a consistent fashion on a five dimensional fractal map. Information associated with ratios such as 4, 6, 8, 9, 10 or 12 would be included in the map, offset by an appropriate magnitude. It would be redundant to require separate dimensions to represent the same information. Prime-numbered fields would carry new information.
The information from harmonic fields would constitute parallel channels (streams) of information. Parallel processing would allow hidden Markov models to solve the problems of phonology and segmenting the stream of speech. This is currently the major roadblock to current strategies for computer speech recognition and voice analysis which do not perform signal processing in terms of categorical features.
The method section of the author's paper, “Fractal harmonic reconstruction of ancient South Asian musical scales,” opens with, “The basic idea of a fractal is that the same processes, or the same statistics or properties of a figure, are found at all size levels. In a fractal representation of multidimensional space each feature of the fractal represents a different axis and the range of values (magnitude) of each feature is plotted along that axis. Familiarity with the relationship between points on one or two axes gives familiarity with the relationships between points on all axes” (See to “B. Levitan; santafe.edu\nk.html.”) “We can map out a rectangular array using the first two factors, then for the next factor we add another array displaced horizontally, followed by a copy of the arrays displaced vertically. By alternating these steps as we add successive factors, we develop the recursive property that gives the representation its fractal nature.” These steps establish that a multidimensional map can be graphically represented in two dimensions. It should be noted that the cited online article by Bennett Levitan was an explanation of how he and Simon Pariser could graphically display various nucleic acid base pairs and the way they mutated to become codons for other amino acids. Although this is in a different field, the pattern of iterative steps (first left to right, then top to bottom, then left to right, etc.) was followed in constructing the fractal harmonic overtone map in order to establish a consistent convention.