Accurate and rapid bio-signal acquisition and estimation is critically important in a clinical environment. In thousands of hospitals across the U.S. and the world surgeons, physicians, and other medical professionals make substantial health-related (and often life-and-death) decisions every day based on input they receive from diagnostic equipment measuring bio-signals. Many examples exist where the application of today's medical technology is hampered because the signal estimation speed allows only limited data collection, and the resulting clinical decision is based on scarce data simply because insufficient time is available for accurate bio-signal acquisition and assessment.
One example is universal neonatal hearing screening. As reported by the National Center for Hearing Assessment and Management, of the 4,000,000 infants born in the U.S. annually 12,000 have permanent hearing loss, with the average occurrence of three per 1,000. Hearing loss at birth has a higher rate of occurrence than any other birth defect. The average age of hearing loss detection without hearing screening in the U.S. is three years of age. When detected, the hearing loss can often be corrected through sound amplification using hearing aids. In the case of more severe disorders, the hearing loss can be corrected by using cochlear implants that provide direct neural stimulation of the auditory nerve. Unfortunately, when a hearing loss is not discovered until the age of three, even if it can be corrected, the formative years of language development have passed, and these children can never acquire normal speech ability for the rest of their life. Hearing impaired children are often mistakenly labeled as mentally deficient, because their speech and listening comprehension skills are not developing at the normal rate. In addition to the emotional impact of late detection, the National Institutes of Health (NIH) estimates that this costs society on the order of $500,000 dollars per child, in special education costs and lost productivity, with an overall economic impact in billions of dollars.
If the hearing loss is detected early, hearing aids can be fitted to an infant as young as four weeks, and a cochlear implant can be surgically inserted at one year of age. When the child and the parents participate in an overall intervention program that includes hearing assistive devices, parent education, special speech training, and ongoing clinical supervision, these children can develop speech ability and listening comprehension fully by the time they are school-age. This has a profound impact on their quality of life, and a very large social and economic impact on the society as a whole. As a result of the remarkable difference that early detection of hearing loss makes, 32 states in the U.S. mandate hearing tests for all infants, and other states have legislation pending or in process.
TABLE 1below is listing of the abbreviations used throughoutthis written description.ABRauditory brainstem responseADCanalog-to-digital converterAWGNadditive white Gaussian noiseBPbandpassC(j,k)wavelet transform coefficients,translation (j) and dilation (k) indicesCRLBCramer Rao lower boundCSTDcyclic-shift tree de-noisingDACdigital-to-analog converterDWTdiscrete wavelet transformE{}expected valueFIRfinite impulse response (filter)G, Ghigh pass QMF, decomposition andreconstruction (frequency domain)g, ghigh pass QMF, decomposition andreconstruction (time domain)H, Hlow pass QMF, decomposition andreconstruction (frequency domain)h, hlow pass QMF, decomposition andreconstruction (time domain)HPhighpass, also HPF denotes highpass filterinfinfimum - greatest lower bound of a setkrecombination level of cyclic shifting(number of sweeps = 2k)L2set of square-summable sequencesLPlowpass, also LPF denotes lowpass filtermodmodulo function (i.e., a mod b = cmeans a/b has a remainder of c)MSEmean square errorMVUminimum variance unbiased (estimator)Nfinal number of frames of ABR dataPDFprobability density functionPSDpower spectral densityQMFquadrature mirror filerRMSroot mean squareRMSEroot mean square error(defined around ABR peak V)SNRsignal to noise ratiosupsupremum - least upper bound of a setVfinite dimensional vector spacevar{}variancex[n]discrete signal indexed by n,discrete function also denoted by xδkde-noising threshold function dependent oncyclic shift level kσ2variance{bn}sequence indexed by n≡congruence modulo(i.e., x ≡ y mod z → x mod z = y mod z)
To accurately and objectively measure hearing loss at birth, a cochlear acoustic test (otoacoustic emission-OAE) and an auditory brainstem response (ABR) test need to be performed, at multiple frequencies and multiple stimulus levels. An otoacoustic emission test presents tones in the ear, and measures a response to those tones coming from the cochlea. An auditory brainstem response test presents acoustical clicks in the ear, and measures the evoked electrical potentials generated by the neurons triggered in response to the acoustical stimulus. These two tests do not require a patient response, and are considered to be objective tests of hearing ability. Signals from both of these tests are completely buried in noise, with signal-to-noise ratios (SNRs) well below 0 dB. All commercially available devices performing OAE and ABR tests use linear averaging to increase the SNR of the acquired signals to the level required for accurate identification of key signal features. For an OAE test, linear averaging produces signals with SNRs large enough in under a minute per ear. Commonly OAE tests are performed at four frequencies at a single stimulus level. For an ABR test, the focus of the preferred embodiment of the invention, the linear averaging process requires a large number of frames to be acquired and averaged, resulting in overall test time of about 10 minutes for one stimulus level in one ear. As a result, the ABR test is not commonly performed on all infants being screened, but only the ones who fail the OAE test. It is also important to note that an OAE test can produce a “pass” result when the neural portion of the auditory system is damaged or completely missing. This is because the signals being measured by an OAE test are produced by the cochlea independent of the neural portion of the auditory system. Hence testing with the ABR test is imperative to get an accurate representation of hearing ability.
For the infants who do receive the ABR test, only a single ABR test at a single stimulus level is commonly performed prior to hospital discharge. As a result of this technology limitation, the NIH was forced to issue only limited recommendations for nationwide hearing screening. These include cochlear testing at a single level at only a few frequencies (3–5), and only a single level brainstem response measurement. See, National Institutes of Health, Early Identification of hearing Impairment in Infants and Young Children, NIH Consens Statement, Mar 1–3;11(1) 1–24 (1993). Ideally, both the cochlear response and the brainstem response should be measured at 10 different frequencies, and at 10 sound levels at each frequency, in accordance with standard audiologic practice.
A substantial increase in speed of bio-signal estimation is required to overcome this problem and accurately detect the presence of hearing loss. Testing at only a few levels causes high rates of false positive (Type I error) test results. Currently, that rate is on the order of 3–15%, where only 0.3% of infants born actually have a hearing loss. At the cost of close to $1,000 for a follow up diagnostic test, on the order of 100 to 600 million dollars will be wasted annually on incorrect referrals when the program is fully implemented. Today over 50% of the infants born in the U.S. are screened at birth, currently wasting between 50 and 300 million dollars. There is also a large emotional cost to the parents of infants incorrectly identified with a hearing loss. These criticisms of neonatal hearing screening programs have been the main hindrance to its full nationwide implementation.
An additional reason why most of the children screened for hearing loss receive only the OAE test and not the ABR test is that it is economically infeasible—the equipment to conduct the ABR test costs approximately $20,000, and it only provides hearing testing at one, or at most two, hearing levels. This leaves a substantial portion of hearing impaired infants at a risk of not being detected.
Many other examples exist of clinical measurements being limited by small SNR of biosignals. These include the use of neuromonitoring during surgery to prevent nerve damage, depth of anesthesia monitoring, ototoxic drug administration, and so on, each suffering from unnecessary and potentially health threatening time delays and incomplete decision making based on scarce measurement data. The term weak bio-signal is defined to mean a signal acquired from a human body that has an SNR of preferably less than about 0 dB. These weak signals may also be found in other applications, and the present invention may just as well be used to de-noise the otser weak signals. However, for purposes of illustration, the inventor has chosen the bio-signal application to represent the application of the invention.
The key obstacle in the process of weak bio-signal acquisition and estimation is the noise that corrupts the signal. The word “noise” is used herein to describe the cumulative effect of numerous sources of energy that are added to the energy of the information-bearing signal that is sought to be measured. Some of the examples of noise types corrupting bio-signals and their acquisition and processing may be found in the available literature, and include the following:
Physiological noise:
                electrical activity (action potentials in nerves, movement of ions in/out of cells)        blood flow (mechanical movement of fluids causing noise)        breathing (obstructed air flow noise)        metabolic activity (chemical)Environmental noise:        Interference from power grid (50 Hz or 60 Hz depending on the country)        Acoustic noise (equipment cooling fans, beepers, other equipment, personnel)        Electromagnetic and radio-frequency interference from other equipment and broadcast mediaBio-signal Acquisition and Processing noise:        Transducer-to-body interface noise (movement noise, “electrode pops”)        transducer internal noise (electrodes, microphones, temperature sensors)        various types of electronics noise (thermal, shot, burst, avalanche noise)        electromagnetic and RF interference from on-board digital circuitry        arithmetic noise in digital processing (quantization, finite register length effects)        
The cumulative effect of all the various noise sources often results in a combined noise magnitude significantly larger than the underlying signal. For present purposes, the cumulative effect of all the noise sources may be considered to be a single equivalent noise source. The noise from this equivalent noise source may be modeled as an additive white Gaussian noise (AWGN), a stochastic process commonly defined in the stochastic signal processing engineering literature. The term ‘additive’ means that the noise energy is added to the signal, as opposed to noise energy multiplying the signal. The name ‘white’ refers to the fact that the power spectral density is constant at all frequencies. The designation ‘Gaussian’ refers to the fact that the probability density function (PDF) of the noise is closely approximated by a Gaussian PDF. The noise is also assumed to be independent of the signal that is being processed. These assumptions are common in biomedical engineering and clinical literature for evoked potential signals, including the ABR. These key characteristics have been tested with the data collected from test subjects, and the results presented herein support the AWGN model.
Bio-signals, as used in this application, are signals generated by biological activity in the human body, transduced into electrical signals, and then processed to arrive at clinically valid data used by medical professionals to make clinical decisions. Most medical devices using bio-signals today are digital, hence the bio-signals are most commonly digitized and then processed digitally. Many different examples exist:                electrocardiogram (EKG): electrical signals produced with the contractions of heart muscle        electroencephalography (EEG): electrical signals produced by neurons creating electrical potentials        electronystamography (ENG): electrical signals produced by vestibulo-ocular function        respiratory flow measurement: air flow signals produced by the lungs.        blood oxygen level monitoring: optical signals produced by the transparency of the blood stream        
In the preferred embodiment, signals generated by the brainstem in response to auditory stimuli are analyzed utilizing a novel wavelet based noise suppression algorithm. The auditory brainstem response was chosen because it is a good example of a weak bio-signal. It is a member of the EEG class of bio-signals, occurring several milliseconds after the onset of the stimulus. The key results derived by studying ABRs are directly applicable to many weak, repetitive bio-signals.
The field of ABR recording and processing is very extensive because ABRs have found very wide applications in the clinical environment. ABRs are also known as brainstem auditory evoked potentials (BAEP or AEPs), brainstem auditory evoked responses (BAER), brainstem evoked response (BSER), as well as early or fast evoked EEG responses. A very large body of literature exists in the form of textbooks, handbooks and journal papers, in which many different uses, aspects and variations of ABRs are explored. In general, ABRs can be used for infant hearing screening, estimation of auditory sensitivity of difficult-to-test or uncooperative patients, neurodiagnosis of eighth nerve or brainstem dysfunction, and monitoring eighth nerve (i.e., acoustic nerve) and brainstem status during neurosurgery. The preferred embodiment will focus on ABRs for hearing ability detection.
Although no single official, detailed standard exists for ABR recordings, the key parameters, nomenclature, electrode placement, range of stimulus types and levels, gains and filter settings, etc. have become consistently used in a clinical environment. These conventions are widely followed in recent publications and current guidelines published by the American Academy of Audiology (AAA), the American Speech-Language-Hearing Association (ASHA), and many U.S. states' clinical processes. Certain portions of signal generation and processing are embodied in different U.S. and international standards, such as the American National Standards Institute (ANSI) and the International Electrotechnical Commission (IEC). See, IEC [International Electrotechnical Commission], “Auditory Test Signals of Short Duration or Audiometric and Neuro-otological Purposes,” International Standard IEC 645-3, 1st Ed., Geneva, Switzerland (1994); ANSI [American National Standards Institute], American National Standard Specifications for Audiometers, [ANSI S3-6-1996], Acoustical Society of America, New York, N.Y. (1996). Hall in his “Handbook of Auditory Evoked Responses”, incorporated herein by reference, gives a thorough treatment of the ABR history, current state, methods, and normative data. He also outlines the clinically accepted methods for ABR recording and interpretation through normative data. Also, Hyde in his 1998 paper “Objective detection and analysis of ABRs: An historical perspective”, incorporated herein by reference, gives an excellent background of signal processing methods commonly used in ABR processing. These two works, along with numerous others, are in line with the currently accepted clinical practices and are the basis of the operation of commercially available ABR testing equipment. The data collection and analysis methods presented herein will be based on the same parameters as in these two works.
The first auditory related neural recordings were reported as early as 1929 by Berger, and 1930 by Weaver and Bray. The history of ABR recordings is very rich, beginning with a thorough description and nomenclature by Jewett and Williston in their 1971 work. From there, a variety of directions were followed with various types of stimuli, electrode placement, electrode type, processing type, etc., but they eventually converged into a single, but not entirely specific, clinical standard with some variations. ABR has been shown to be invariant under many different patient conditions (wake state, environment, time of day, etc.), and to have predictable morphology of responses across patients. It is an indicator of hearing ability, which can be measured objectively without patient response. Its amplitude is on the order of microvolts, and it is commonly contaminated by noise of the amplitude on the order of millivolts. ABRs are commonly processed by linear averaging, and the test of a response at a single stimulus level of a single ear takes approximately 5–10 minutes depending on noise conditions, quality of electrode placement, and other factors. Hence, testing at multiple levels in both ears commonly takes over an hour in a standard clinical setting. Taking over an hour to test an infant is difficult in a universal neonatal hearing screening program, because the infants are commonly only in the hospital for a short amount of time. During that short time they have to be tested for a variety of other conditions and their vital signs are continuously monitored. Another problem with long testing time arises in large metropolitan hospitals that commonly have 6,000–10,000 births per year. On average they have more than 20 infants in a nursery at any one time, and taking an hour to test each infant would require dedicated staff and multiple test devices. It would be a large and unacceptable economic burden on hospitals to require that each of these infants be screened with a test that takes over an hour. Also, the standard of care for infant health screening is that all other screening tests (PKU, sugar, etc.) take at most a few minutes. Thus, there is a long felt need to have a quick ABR test with results that are meaningful.
A typical ABR diagnostic system used for clinical and for research purposes is shown in FIG. 4. Auditory brainstem responses are evoked by presenting auditory clicks of very short duration in the ear canal. The ABR testing controller initiates click generation based on user input. Clicks are generated digitally and converted to analog voltage pulses of 100 μs duration by the digital-to-analog converter (DAC). This analog signal is fed into the speaker, which is inserted in the ear, and acoustic clicks are presented several thousand times. The click repetition rate is approximately 30–60 clicks/sec, and overall test duration is typically about 5–10 minutes.
The signal is commonly acquired using a set of three skin electrodes: one on the forehead, and one on the mastoid process behind each ear. One of the mastoid electrodes is used as a reference, while the potential between the forehead electrode and the other mastoid electrode is amplified differentially by a factor of approximately 15,000. Electrical power supply lines operating at 50 or 60 Hz, depending on the country, produce a large electrical interference signal in the ABR recording. To reduce the effects of this interference, the common mode output of the differential amplifier is inverted and fed back into the reference electrode. This creates a common mode active ground circuit, routinely used in EEG, ECG and ABR equipment. The amplified signal is converted to a digital signal using an analog-to-digital converter (ADC). The ADC contains an anti-aliasing low pass (LP) filter with a cutoff frequency at 3 kHz. The digital signal is then filtered by a digital linear phase BP filter with user selectable filter settings. Most commonly used criteria are about 30–100 Hz for the low frequency cutoff, and about 1,500–3,000 Hz for the high frequency cutoff.
The start of data acquisition is synchronized to the onset of the clicks by the ABR testing controller, and continues for a period of approximately 15 ms after each click. A single frame of data containing 15 ms of a measured response corresponds to a particular click. Several thousand acquired signal frames are then typically linearly averaged to obtain a smooth estimate of the ABR response. Each point in the final average frame is linearly averaged across each of the N frames of data. The SNR is calculated as the ratio of signal variance and noise variance. The averaging continues until the SNR exceeds a preset amount, and the system determines that a valid ABR response is present. Once a valid ABR waveform is obtained, a human expert trained in ABR interpretation (normally a state-certified audiologist) determines where the peaks are and what their latencies are, and then calculates the inter-peak latencies. They then determine whether the results are in the range of normative data for healthy subjects. If the results are not in the normative range, a pathology is declared to be present.
Over the past decade, wavelet based signal processing has emerged as a new research area in the signal processing community. The most common fields of applications of wavelets are in noise suppression (commonly referred to in the wavelet literature as “de-noising”), data compression, digital communication, system identification and others being added.
The wavelet transform, a member of the family of Fourier transforms, is a process of decomposing a given signal into a set of orthonormal basis functions called wavelets. The present invention utilizes finite length, discrete signals, so only the discrete signal transforms will be discussed.
In the traditional discrete Fourier transform (DFT), as commonly referred to in the signal processing field, the signal is decomposed using complex sinusoids as basis functions, producing a frequency domain representation of the signal. In contrast, the discrete wavelet transform (DWT) uses a family of specifically designed functions called wavelets (little waves) as basis functions. A family of wavelets is created by dilating (or “stretching”) the original wavelet function called the “mother wavelet”. A wavelet transform decomposes the signal in both time and frequency using different dilations of the mother wavelet. With the application of the DWT, the one-dimensional finite signal x[n] is represented in two-dimensional “wavelet coordinates”. Individual levels of signal decomposition are created, called scales. At each scale, a set of coefficients is created by computing the inner product of the original signal x[n] with a scaled version of the mother wavelet. The mother wavelet function is designated by Ψ, and its dilations are designated by Ψ(j). The position index of a wavelet at scale j is called a translation. The value of the wavelet is completely described by the two dimensional sequence Ψ(j,k), where j is the scale (or stretch level) index of the wavelet, and k is the translation (or position) index. We then define the DWT as follows:
            C      ⁡              (                  j          ,          k                )              =                  ∑                  n          =          0                          N          -          1                    ⁢                        x          ⁡                      [            n            ]                          ⁢                              ψ                          j              ,              k                                ⁡                      [            n            ]                                ,            where      ⁢                          ⁢                        ψ                      j            ,            k                          ⁡                  [          n          ]                      =                  2                  -                      j            2                              ⁢                        ψ          ⁡                      (                                                            2                                      -                    j                                                  ⁢                n                            -              k                        )                          .            Coefficients C(j,k) are the wavelet coefficients at different scales j and translations k of the inner product of the wavelet Ψ(j,k) with the original signal x[n]. In wavelet coordinates, information about both the frequency and the location (time) of the signal energy is preserved. In traditional Fourier transform using complex exponentials time information is lost.
Conventional wavelet de-noising is a process of noise suppression that utilizes assumptions about smoothness and coherence properties of both the underlying signal and the noise that contaminates it. Similar to filtering in the frequency domain, the wavelet coefficient thresholding algorithm (“wavelet shrinkage”) reduces sets of wavelet coefficients in the wavelet domain. This process is based on the assumption that the underlying signal is smooth and coherent, while the noise that is mixed with the signal is rough and incoherent. Smoothness of a signal is a property related to its bandwidth, and is defined in relation to how many times a signal can be differentiated. The degree of smoothness is equal to the number of continuous derivatives that can be calculated.
A signal is coherent if its energy is concentrated in both time and frequency domains. An incoherent noise is “spread out”, and not concentrated. One measure of coherence is how many wavelet coefficients are required to represent 99% of the signal energy. A time-frequency signal space is completely spanned (covered) by wavelet coefficients at all scales and translations. A well-concentrated signal decomposed in an appropriately selected wavelet basis will require very few coefficients to represent 99% of signal energy. On the other hand, a completely incoherent noise will require 99% of the coefficients that span the entire space to represent 99% of its energy.
The conventional wavelet de-noising process is a three-step process:
1. Wavelet transform the signal to obtain wavelet coefficients at different scales
2. Threshold the coefficients and set to zero any smaller than a threshold δ
3. Perform the inverse wavelet transform to approximate the original signal
In the de-noising process, the noise components of the signal are attenuated by selectively setting the wavelet coefficients to zero. De-noising is thus a non-linear operation, because different coefficients are affected differently by the threshold function. There are many parameters to control in this algorithm: level of wavelet decomposition, threshold selection, using of different thresholds at different wavelet decomposition levels, scaling of wavelet coefficients that are kept by a fixed amount, and so on. However, what is common to all these variations is that in the prior art the process is performed only once, on a single signal frame.
One of the assumptions made in conventional de-noising is that the SNR of the signal being de-noised is relatively high. The algorithm relies on the fact that the amplitude of the signal is substantially larger than the amplitude of the noise, thus producing larger wavelet coefficients for the signal than for the noise. Hence, an application of conventional de-noising to small SNR signals fails as taught by the prior art.
Conventional de-noising has been demonstrated in the literature as a fast estimator of signals corrupted by noise. It operates on a single frame of the signal, by performing a single wavelet transform, setting select coefficients to zero, and then performing an inverse wavelet transform. This suggests that there are two ways to apply conventional de-noising to ABR signals, given that a single frame of data is required as an input to the algorithm. One way is to de-noise each frame individually, and then average the results. When conventional de-noising is applied to a single, un-averaged frame, thresholding the wavelet coefficients |C(j,k)|<δ to zero effectively eliminates almost all of the wavelet coefficients, including the ones representing the signal. This approach fails completely, because most of the signal energy is lost by setting the wavelet coefficient to zero. The inverse wavelet transform of the de-noised single frame wavelet coefficients produces a very low amplitude, noise only, signal.
The second way to apply conventional de-noising is to first average the individual ABR frames together to create a single averaged frame, and then de-noise that single averaged frame. This approach also leads to a substantial decrease in performance, until a very large number of frames (several thousand) are pre-averaged together. Hence conventional de-noising does not work for de-noising ABR signals when compared to linear averaging.
The present invention presents a new algorithm that may be conveniently implemented in a digital processor that utilizes information from all of the individual N frames of data and produces an estimator whose performance exceeds that of the linear averaging process. The new algorithm recombines the original low SNR data frames in a tree-like fashion, creating an array of new frames of size N*K, where K>>1. Two adjacent frames of original data are linearly averaged and de-noised, thus creating a new frame that is not a linear combination of the original two adjacent frames. A new level of frames is created and each new frame at that level is averaged and de-noised with a small threshold value. This is illustrated in FIG. 5.
The process of building an array of new frames is iterative. The new method first applies de-noising with a small threshold δk to each of the N original ABR data frames and then recombines the frames to obtain new single-frame sub-averages. De-noising is applied again with a different threshold δk+1 to each one of the new frames. This process continues until a set of N*K wavelet de-noised frames is obtained. When the last level of frames K is obtained, the frames at this level are linearly averaged to generate a single de-noised frame. The operation of frame recombination preferably has K=log2(N) levels, and at each level preferably a different wavelet coefficient threshold δk is applied as a function of the level k. The novel algorithm in its broadest sense has three main features, amongst others:    1. Each individual frame of the N original data frames is used to estimate the signal,    2. De-noising is performed in step-by-step fashion, and preferably using different threshold levels,    3. K*N new frames of data are preferably created from the original N frames.
Application of this algorithm increases the quality of the averaged signal such that a waveform can be reliably interpreted by a human expert after only a small number of ABR frames have been acquired. The novel algorithm may be compared to linear averaging. The performance may be tested against linear averaging using both simulated data and human subjects, to demonstrate that the novel algorithm produces a faster estimate of key features of the underlying low SNR signal.
While some of the advantages and features of the present invention have been explained above, a fuller understanding may be gained by referring to the drawings and detailed description of the preferred embodiment that follow.