The invention relates to methods for the spectral analysis of time-sampled signals. More particularly, the invention relates to methods for producing spectrograms of human speech or other time-varying signals.
It is useful, in many fields of technology, to determine the changing frequency content of time-dependent signals. For example, the spectral analysis of speech is useful both for automatic speech recognition and for speech coding. As a further example, the spectral analysis of marine sounds is useful for acoustically aided undersea navigation.
When an acoustic signal, or other signal of interest, is sampled at discrete intervals, a time series is produced. A time series is said to be stationary if its statistical properties are invariant under displacements of the series in time. Although few of the signals of interest are truly stationary, many change slowly enough that, for purposes of spectral analysis, they can be treated as locally stationary over a limited time interval.
The spectral analysis of stationary time series has been a subject of research for one hundred years. The earliest attempts to obtain a representation, or periodogram, of the power spectral density of the time series x(0), x(1), . . . , x(n), . . . , x(Nxe2x88x921) involved summing N terms of the form x(n)xc3x97einxcfx89 and then taking the squared magnitude of the result. (The symbol xcfx89 represents frequency in radians per second. The symbol ƒ, used below, represents frequency in cycles per second. Thus, xcfx89=2xcfx80ƒ.) This operation was performed for each of N/2+1 discrete frequencies ƒ. This was unsatisfactory for several reasons. One reason is that the result is not statistically consistent. That is, the variance of the resulting periodogram does not decrease as the sample size N is increased. A second reason is that the result can be severely biased by truncation effects, leading to inaccurate representation of processes having continuous spectra.
An improved spectrum estimate (it is an estimate because it is derived from a finite sample of the original signal) is obtained from the following method, which is conveniently described in two steps:
First, form the spectrum estimate {tilde over (S)}D(xcfx89) using a data window D0, D1, . . . , Dn, . . . , DNxe2x88x921 to taper the sampled data sequence, according to:                                                         S              ~                        D                    ⁡                      (            ω            )                          =                                            "LeftBracketingBar"                                                ∑                                      n                    =                    0                                                        N                    -                    1                                                  ⁢                                                      x                    ⁡                                          (                      n                      )                                                        ⁢                                      D                    n                                    ⁢                                      ⅇ                                                                  -                        ⅈ                                            ⁢                                              xe2x80x83                                            ⁢                      ω                      ⁢                                              xe2x80x83                                            ⁢                      n                                                                                  "RightBracketingBar"                        2                    .                                    (        1        )            
The primary purpose of the data window is to control bias. That is, by tapering the sampled sequence, it is possible to mitigate the tendency of the frequency components where the power is highest to dominate the spectrum estimate.
Then, smooth the estimate {tilde over (S)}D(xcfx89) by convolving it with a spectral window G(xcfx89) to form the smoothed spectrum estimate {tilde over (S)}(xcfx89) according to {tilde over (S)}(xcfx89)={tilde over (S)}D(xcfx89)*G(xcfx89),
where * represents the convolution operation. The primary purpose of the spectral window is to make the spectrum estimate consistent. The spectral window is generally pulse-shaped in frequency space, and the width of this pulse is approximately the bandwidth of the spectrum estimate. Increasing the bandwidth decreases the variance of the resulting estimate, but it also reduces the frequency resolution of the estimate.
Although useful, the smoothed spectrum estimate {tilde over (S)}(xcfx89) as described above has several drawbacks. The smoothing operation may obscure the presence of spectral lines. Moreover, the data window tends to give different weights to equally valid data points. The data window also tends to reduce statistical efficiency. That is, the amount of data needed to obtain a reliable estimate may exceed the theoretical ideal by a factor of two or more.
Recently, a new spectrum estimate having improved properties was proposed. This estimate is described, e.g., in D. J. Thomson, xe2x80x9cSpectrum Estimation and Harmonic Analysis,xe2x80x9d Proc. IEEE 70 (September 1982) 1055-1096 (hereafter, xe2x80x9cThomson (1982)xe2x80x9d). This estimate is computed using a sequence of window functions referred to as Slepian functions when expressed as functions of frequency, and as Slepian sequences when expressed as sequences in the time domain. Slepian functions are related to Slepian sequences through the Fourier transform. Because multiple window functions are used, such an estimate is referred to as a multitaper spectrum estimate, or occasionally as a multiple-window spectrum estimate.
The properties of Slepian functions and Slepian sequences are described in Thomson (1982), cited above, and in D. Slepian, xe2x80x9cProlate Spheroidal Wave Functions, Fourier Analysis, and Uncertaintyxe2x80x94V: The Discrete Case,xe2x80x9d Bell System Tech. J. 57 (1978) 1371-1430, hereafter referred to as Slepian (1978). Briefly, the Slepian sequences depend parametrically on the size N of the data sample and on the chosen bandwidth W. (From practical considerations, the bandwidth is generally chosen to lie between 1/N and 20/N, and at least as a starting value it is typically about 5/N.) It should be noted that throughout this discussion, the well-known convention is used wherein all frequencies are normalized such that the Nyquist frequency equals 0.5.
Given values for these parameters, each Slepian sequence v(k)(N,W) is a k""th solution to a matrix eigenvalue equation Mv=xcexkv, where the element in the n""th row and m""th column of the matrix is given by:             sin      ⁢              xe2x80x83            ⁢      2      ⁢              xe2x80x83            ⁢      π      ⁢              xe2x80x83            ⁢              W        ⁢                  (                      n            -            m                    )                            π      ⁢              (                  n          -          m                )              ,
n=1, 2, . . . , N, m=1, 2, . . . , N.
If the eigenvalues xcexk of this equation are arranged in descending order, approximately the first K of them are very close to (but less than) unity. K is the greatest integer less than or equal to 2NW. At least for moderate values of N, the solutions are readily computed using standard techniques. (For such purpose, it is advantageous to use an alternative representation of these sequences which uses a matrix in tridiagonal form. For further information, see Slepian (1978), which is hereby incorporated by reference.)
The Slepian functions Uk(N,W;ƒ) are computed from corresponding Slepian sequences through the formula                                                         U              k                        ⁡                          (                              N                ,                                  W                  ;                  f                                            )                                =                                    ϵ              k                        ⁢                                          ∑                                  n                  =                  0                                                  N                  -                  1                                            ⁢                                                                    v                    n                                          (                      k                      )                                                        ⁡                                      (                                          N                      ,                      W                                        )                                                  ⁢                                  ⅇ                                      ⅈ                    ⁢                                          xe2x80x83                                        ⁢                    2                    ⁢                                          xe2x80x83                                        ⁢                    π                    ⁢                                          xe2x80x83                                        ⁢                                          f                      ⁡                                              [                                                  n                          -                                                                                    N                              -                              1                                                        2                                                                          ]                                                                                                                                ,                            (        2        )            
where xcex5 is 1 when k is even, and i when k is odd.
Of any function which is the Fourier transform of an index limited sequence, the k=0 Slepian function has the greatest fractional energy concentration within the frequency range between xe2x88x92W and W. More generally, the k""th eigenvalue xcexk expresses the fraction of energy retained within this frequency range by the corresponding Slepian function. As noted, this fraction is very close to unity for the first K Slepian functions.
The spectrum estimate of Thomson (1982) is computed from K eigencoefficients y0(ƒ), Y1(ƒ) , . . . , yKxe2x88x921(ƒ), wherein the k""th such eigencoefficient is computed through the formula,                                           y            k                    ⁡                      (            f            )                          =                              ∑                          n              =              0                                      N              -              1                                ⁢                                    x              ⁡                              (                n                )                                      ⁢                                                            v                  n                                      (                    k                    )                                                  ⁡                                  (                                      N                    ,                    W                                    )                                                            ϵ                k                                      ⁢                                          ⅇ                                                      -                    ⅈ                                    ⁢                                      xe2x80x83                                    ⁢                  2                  ⁢                  π                  ⁢                                      xe2x80x83                                    ⁢                                      f                    ⁡                                          (                                              n                        -                                                                              N                            -                            1                                                    2                                                                    )                                                                                  .                                                          (        3        )            
At a given frequency ƒ=ƒ0, the spectrum estimate, denoted {overscore (S)}(ƒ), is band limited to a frequency range of xc2x1W about ƒ0. The spectrum estimate is computed from the eigencoefficients according to,                                           S            _                    ⁡                      (            f            )                          =                              1                          2              ⁢              NW                                ⁢                                    ∑                              k                =                0                                            K                -                1                                      ⁢                                          1                                                      λ                    k                                    ⁡                                      (                                          N                      ,                      W                                        )                                                              ⁢                                                                    "LeftBracketingBar"                                                                  y                        k                                            ⁡                                              (                        f                        )                                                              "RightBracketingBar"                                    2                                .                                                                        (        4        )            
It will be appreciated that each term in this summation is individually a spectrum estimate of the usual kind, as represented, e.g., by Equation (1), in which a respective Slepian sequence is the data window. In fact, the k=0 term is the optimal spectrum estimate of that kind, but even so, it must be smoothed in order to make it statistically consistent. Smoothing, however, tends to increase the effective bandwidth to several times W, and it concomitantly increases the bias of the estimate. On the other hand, when the rest of the eigencoefficients are included (up to the k=Kxe2x88x921 term), consistency and good variance efficiency are achieved without decreasing the spectral resolution.
Multiple window spectrum estimates are discussed further in D. J. Thomson, xe2x80x9cTime Series Analysis of Holocene Climate Data,xe2x80x9d Phil. Trans. R. Soc. Lond. A 330 (1990) 601-616 (hereafter, xe2x80x9cThomson (1990)xe2x80x9d). That article introduces a slightly different definition of the Slepian function, which uses a more common definition of the Fourier transform than the one used, e.g., in Slepian (1978). The Slepian function Vk(ƒ) of Thomson (1990) may be computed by Fourier transforming the corresponding Slepian sequence according to:                                           V            k                    ⁡                      (                          N              ,                              W                ;                f                                      )                          =                              ∑                          n              =              0                                      N              -              1                                ⁢                                                    v                n                                  (                  k                  )                                            ⁡                              (                                  N                  ,                  W                                )                                      ⁢                                          ⅇ                                                      -                    ⅈ                                    ⁢                                      xe2x80x83                                    ⁢                  2                  ⁢                  π                  ⁢                                      xe2x80x83                                    ⁢                  fn                                            .                                                          (        5        )            
This form of the Slepian function is related to Uk(N,W;ƒ) by the expression:                                           V            k                    ⁡                      (                          N              ,                              W                ;                f                                      )                          =                              (                          1                              ϵ                k                                      )                    ⁢                      ⅇ                                          -                ⅈ                            ⁢                              xe2x80x83                            ⁢              π              ⁢                              xe2x80x83                            ⁢                              f                ⁡                                  (                                      N                    -                    1                                    )                                                              ⁢                                                    U                k                            ⁡                              (                                  N                  ,                                      W                    ;                                          -                      f                                                                      )                                      .                                              (        6        )            
The same article also introduces an alternate form xk (ƒ) for the eigencoefficients, given by                                           x            k                    ⁡                      (            f            )                          =                              ∑                          n              =              0                                      N              -              1                                ⁢                                    ⅇ                                                -                  ⅈ                                ⁢                                  xe2x80x83                                ⁢                2                ⁢                π                ⁢                                  xe2x80x83                                ⁢                fn                                      ⁢                                                            v                  n                                      (                    k                    )                                                  ⁡                                  (                                      N                    ,                    W                                    )                                            ·                                                x                  ⁡                                      (                    n                    )                                                  .                                                                        (        7        )            
The same article also describes a multiple-window spectrum estimate {overscore (S)}(ƒ) computed by summing the squared magnitudes of the eigencoefficients xk(ƒ), each weighted by an appropriately chosen weight coefficient wk:                                           S            _                    ⁡                      (            f            )                          =                              1            K                    ⁢                                    ∑                              k                =                0                                            K                -                1                                      ⁢                                          w                k                            ⁢                                                                    "LeftBracketingBar"                                                                  x                        k                                            ⁡                                              (                        f                        )                                                              "RightBracketingBar"                                    2                                .                                                                        (        8        )            
Thomson (1990) also describes a procedure for subdividing the data sequence into overlapping blocks, the base time of each block advanced by some offset from the base time of the preceding block, and computing the multiple-window spectrum estimate on each block.
It should be noted that each of the preceding spectrum estimates implicitly assumes stationarity. That is, each assumes that {overscore (S)}(ƒ) does not involve time, except for the implicit time dependence that comes from defining the sample on the discretized time block spanning the interval (0, Nxe2x88x921). On the other hand, spectrograms dealing explicitly with nonstationary processes have been used for many years. An early paper describing such techniques is W. Koenig et al., xe2x80x9cThe Sound Spectrograph,xe2x80x9d J. Acoust. Soc. Am. 18:19 (1946). In essence, these techniques involve estimates of the kind expressed by Equation (1), above, with the further property that the sample is stepped along in time. Thus, such an estimate might be wiritten as                                                                         S                ~                            D                        ⁡                          (                              b                ,                f                            )                                =                                    "LeftBracketingBar"                                                ∑                                      n                    =                    0                                                        N                    -                    1                                                  ⁢                                                      x                    ⁡                                          (                                              b                        +                        n                                            )                                                        ⁢                                      D                    n                                    ⁢                                      ⅇ                                                                  -                        ⅈ                                            ⁢                                              xe2x80x83                                            ⁢                      2                      ⁢                      π                      ⁢                                              xe2x80x83                                            ⁢                      f                      ⁢                                              xe2x80x83                                            ⁢                      t                                                                                  "RightBracketingBar"                        2                          ,                            (        9        )            
where b now represents the base time, that is, the time (measured from a fixed origin) at the beginning of a given sample block, and n represents relative (discrete) time within the block. Thomson (1990) updated this idea by replacing {tilde over (S)}D(ƒ) with {overscore (S)}(ƒ) as in Equation (8), above.
Significantly, the bandwidth-limited signal in the frequency band (ƒxe2x88x92W,ƒ+W) can be expanded in the time block [0, Nxe2x88x921] as                                           X            ⁡                          (                              t                ,                f                            )                                =                                    ∑                              k                =                0                                            K                -                1                                      ⁢                                                            x                  k                                ⁡                                  (                  f                  )                                            ⁢                                                v                  t                                      (                    k                    )                                                  ⁡                                  (                                      N                    ,                    W                                    )                                                                    ,                            (        10        )            
where xk(ƒ) is defined as in Equation (7), above. This observation is made, e.g., in D. J. Thomson, xe2x80x9cMulti-Window Bispectrum Estimates,xe2x80x9d Proc. Workshop on Higher-Order Spectral Analysis, Vail, Colo. (Jun. 28-30, 1989). However, it has not been appreciated, until now, that such an expansion may be useful for formulating an improved spectrum estimate.
I have found an improved spectrum estimate that is based on the expansion described by Equation (10), above. Because this spectrum estimate depends explicitly on both time and frequency, I refer to it as a spectrogram. The time resolution of this spectrogram is approximately xc2xdW. Because in typical applications the product 2NW is equal to the number K of Slepian sequences, an alternately formulated estimate for this bandwidth is N/K. By contrast, the time resolution of conventional spectrograms is typically roughly equal to the block size, N. Thus, my improved spectrogram is a high-resolution spectrogram.
In a broad aspect, my invention involves a method for processing a time-varying signal to produce a spectrogram. The method includes sampling the signal at intervals, thereby to produce a time series x(n), wherein x represents sampled signal values and n represents discretized time. The method further includes obtaining plural blocks of data x0, x1, . . . , xNxe2x88x921 from the time series, wherein each block contains signal values x(n) taken at an integer number N of successive sampling intervals.
The method further includes calculating an integer number K of eigencoefficients xk(ƒ) on each said block, wherein each said eigencoefficient is dependent on frequency ƒ and has a respective index k, k=0, 1, . . . , Kxe2x88x921. The method further includes, for each said block, forming a time- and frequency-dependent expansion X(t,ƒ) from the eigencoefficients, wherein t represents time.
The method further includes taking a squared magnitude of the expansion, and outputting a spectrogram derived at least in part from the resulting squared magnitude. Significantly, each eigencoefficient represents signal information projected onto a local frequency domain using a respective one of K Slepian sequences or Slepian functions. Moreover, each expansion X(t,ƒ) is a sum of terms, each term containing the product of an eigencoefficient and a corresponding Slepian sequence.