The present invention relates to signal processing. In particular, the present invention relates to transforming an input signal (e.g., a sonar echo) so that the useful information (e.g., a target signature) embedded therein can be discriminated, detected, and classified using aural classification skills of the listener.
The ability of echo-locating dolphins to discriminate shape, structure, and material of a submerged target has been demonstrated in the studies by Au and Hammer. See W. W. L. Au and C. E. Hammer, Target Recognition via Echolocation by Tursiops truncatus, in: Animal Sonar Systems, R. G. Busnel and J. F. Fish, ed, pg. 855-858, Plenum Press, New York, (1980) and C. E. Hammer, and W. W. L. Au, Porpoise Echo-Recognition: An Analysis of Controlling Target Characteristics, J. Acoust. Soc. Amer., 68, pg. 1285-1293 (1980). Human listeners have also been shown to possess this same ability in other studies. See J. F. Fish, C. S. Johnson and D. K. Ljungblad, Sonar Target Discrimination by Instrumented Human Divers, J. Acoust. Soc. Amer., 59, pg. 602-606 (1976); Advanced Resource Development Corporation, xe2x80x9cThe Use of Aural Processed Signals for Mine Detection and Classification,xe2x80x9d US Naval Air Command Report, Co. #N00019-85-C-0451, (1986); Advanced Resource Development Corporation, xe2x80x9cAn Interactive Neural Network System for Acoustic Signal Classification,xe2x80x9d Office of Naval Research Report, Co. #N00014-879-C-0237, (1990); and R. L. Thompson, xe2x80x9cAural Classification of Sonar Echoes,xe2x80x9d Electrical and Computer Engineering Report, U. T. at Austin Report, May 1995. In fact, human divers have been shown to equal or exceed the dolphins"" abilities provided that certain operations are performed on the echo signals before they are presented to the diver. However, in previous efforts, the methods used to process echo signals have not been optimized for the natural listening processes of the human observer.
In the prior art, the techniques used to prepare echo signals for listener classification are: (1) to time shift the echo signal into the audible band of the listener or, (2) to heterodyne the echo signal up and then time shift the signal down into the audible band of the listener. These methods are limited to a narrow range of listening durations and are not designed to optimize the interface between the signal producing system and the human auditory classification system.
The human auditory classification system is based on phonemes. Phonemes are specific observable patterns of human speech, e.g., temporal, spectral, and redundancy patterns, which are operated upon by the natural speech-signal processing capabilities of the brain. It is reasonable to assume that during the interactive co-evolution of the speech and auditory systems, considerable adaptation has taken place to optimize a portion of the human auditory system for the classification of human speech using phonemes. The existing neural structures and learned behaviors used on a daily basis are assets that can be employed readily to perform signal processing aurally. These existing assets and skills have evolved in humans specifically for the purpose of extracting the details of words from noise.
One result of this coupled evolution of the auditory and speech systems is that the human auditory classification process is particularly robust in the presence of noise when the signal is speech. Thus, the human auditory system could be a very effective tool in a signal classification system based on human speech phonemes. However, active sonar systems typically use frequencies in the ultrasonic regions (above audible frequencies). This creates a frequency mismatch between the needs of a sonar system and the capabilities of the auditory system.
In sonar processing, classical techniques for overcoming the frequency mismatch problem include time stretching, modulation, or a combination of both to shift the sonar echo into the audible frequency range. Time stretching does not disrupt the harmonic structure of the signal. However, time stretching compresses the signal bandwidth and can under-utilize the full bandwidth of the auditory system. On the other hand, modulation does not compress the signal bandwidth but does alter the harmonic structure.
Regardless of which classical approach is used, another more subtle problem still exists with the interface between the sonar and human ear. Sonar echoes generally do not sound like the signals for which the auditory classification system has been optimized, i.e., speech signals or phonemes. Thus, existing sonar processing techniques do not account for the inherent natural abilities of the human auditory classification system.
Accordingly, it is an object of the present invention to provide a method of processing an input signal to provide an output signal which permits the application of the existing neural structures and learned behaviors of human listeners.
Another object of the present invention is to provide a method of processing input signals of short duration into a new signal with extended duration that allows the natural speech-signal processes of the human brain to be applied to extract details of a target signal from noise.
Yet another object of the present invention is to provide a method of transforming a non-speech input signal such as a sonar echo signal into a signal having the temporal, spectral, and redundancy patterns of human speech.
These and other objects and advantages of the invention will be set forth in part in the description which follows, and will be apparent from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
In accordance with the present invention, a method is provided to transform non-speech input signals into the temporal, spectral and redundancy patterns resembling that of human speech. A digital signal series xph (n) is generated as a function of a non-speech analog input signal in accordance with             X              p        ⁢                  xe2x80x83                ⁢        h              ⁡          (      n      )        =            ∑              p        =        0                              (                      N            -            1                    )                /        A              ⁢                  ∑                  m          =          0                          L          -          1                    ⁢                        x          ⁡                      (                                                            p                  ⁢                                      xe2x80x83                                    ⁢                  A                                +                m                                            f                s                                      )                          ⁢                  w          ⁡                      (            m            )                          ⁢                  δ          ⁡                      (                          n              -              pL              +              m                        )                              
where n is the sample number of the next Xph, N is the length of the input signal, L is the length of a windowed portion of the input signal, A is an offset between successively applied ones of the windowed portions, w(m) is a smoothing function simulating amplitude and structure of human speech phonemes, fs is a rate at which the input signal is sampled, and xcex4(nxe2x88x92pL+m) is a delta function that is equal to 1 for (nxe2x88x92pL+m)=0 and equal to 0 for (nxe2x88x92pL+m)xe2x89xa00. The resulting digital signal series has temporal, spectral and redundancy patterns resembling that of human speech. The digital signal series is output to a device that presents the temporal, spectral and redundancy patterns of the digital signal series.