The present technology relates to a signal processing apparatus, a signal processing method, and a program, and particularly to a signal processing apparatus, a signal processing method, and a program in which an audio signal is decompressed or compressed through a time axis domain process.
As a time axis domain decompression and compression algorithm for an audio signal, Pointer Interval Controlled OverLap and Add (PICOLA) that is a simple process and obtains a processing result of high sound quality is well known and used (e.g., see Morita Naotaka, Itakura Fumitada, “Audio Decompression and Compression in Time Axis Using Pointer Interval Controlled OverLap and Add (PICOLA) based on Pointer Movement Amount Control, and Evaluation Thereof,” Proceedings of the Acoustical Society of Japan, issued October 1986, p. 149-150).
FIG. 1 is a block diagram showing an example of a configuration of a playback speed conversion apparatus for compressing an audio signal through a time axis domain process according to a PICOLA algorithm.
A playback speed conversion apparatus 10 of FIG. 1 includes a recording unit 11, a processing buffer unit 12, a pitch calculation unit 13, an operation unit 14, a processing control unit 15, and an accumulation unit 16. A playback speed of an audio signal is multiplied by R (R>1).
The recording unit 11 of the playback speed conversion apparatus 10 records an audio signal that is a Pulse Code Modulation (PCM) signal in time series. The recording unit 11 transfers via Direct Memory Access (DMA) the recorded audio signal to the processing buffer unit 12 in recording order.
The processing buffer unit 12 temporarily stores the audio signal DMA-transferred from the recording unit 11 in reception order. Further, based on a start position P supplied from the processing control unit 15 and a pitch cycle T0 supplied from the pitch calculation unit 13, the processing buffer unit 12 reads an audio signal of samples in twice the pitch cycle T0 from a sample in the start position P.
The start position P is a sample number of a sample in a compression start position, and the sample number is a number given, in order, to each sample of the audio signal in time series stored in the processing buffer unit 12. The pitch cycle T0 is the number of samples in a pitch cycle of the audio signal.
The processing buffer unit 12 supplies the read audio signal as an arithmetic processing signal to the operation unit 14. Further, the processing buffer unit 12 determines a position P+T0 that is a sample number of the T0-th sample from the sample in the start position P based on the start position P and the pitch cycle T0. The processing buffer unit 12 overwrites the stored audio signal of samples in the pitch cycle T0 from the sample in the position P+T0 with an arithmetic processing signal after compression, which is supplied from the operation unit 14.
Further, the processing buffer unit 12 obtains a playback signal length L indicating the number of samples of an audio signal after playback speed conversion using the following Equation (1) based on a playback speed conversion ratio R input from the outside and the pitch cycle T0 supplied from the pitch calculation unit 13.
Further, the playback speed conversion ratio R is a length ratio of the audio signal after playback speed conversion recorded in the accumulation unit 16 to the audio signal before playback speed conversion recorded in the recording unit 11. The playback speed conversion ratio R is input to the processing buffer unit 12 and the processing control unit 15, for example, by a user manipulating an input unit which is not shown.
                    L        =                              T            0                    ×                      1                          R              -              1                                                          (        1        )            
The processing buffer unit 12 DMA-transfers the audio signal of samples in the playback signal length L from the sample in the position P+T0 containing the audio signal overwritten to the compressed arithmetic processing signal, as the audio signal after playback speed conversion for an audio signal from the sample in the start position P to the sample in the next start position P, to the accumulation unit 16. In this case, when the processing buffer unit 12 does not yet store all of the audio signal of the samples in the playback signal length L from the sample in the position P+T0, the processing buffer unit 12 DMA-transfers only an already stored signal in the entire audio signal to the accumulation unit 16. The processing buffer unit 12 then requests the recording unit 11 to DMA-transfer a remaining audio signal, temporarily stores the audio signal DMA-transferred according to the request, and DMA-transfers the audio signal to the accumulation unit 16.
The pitch calculation unit 13 calculates the pitch cycle T0 of the audio signal by referring to an audio signal of samples in twice a maximum pitch cycle Tmax that is a maximum value of numbers of samples in a previously set pitch cycle from the sample in the start position P, which is stored in the processing buffer unit 12. Specifically, the pitch calculation unit 13 calculates, as the pitch cycle T0, a period T for minimizing an average distortion d(T) defined, for example, by the following Equation (2) based on the audio signal of the samples in twice the maximum pitch cycle Tmax from the sample in the start position P. The pitch calculation unit 13 supplies the calculated pitch cycle T0 to the processing buffer unit 12 and the processing control unit 15.
                                          d            ⁡                          (              T              )                                =                                    1              T                        ⁢                                          ∑                                  i                  =                  0                                                  T                  -                  1                                            ⁢                                                {                                                            x                      ⁡                                              (                        i                        )                                                              -                                          x                      ⁡                                              (                                                  i                          +                          T                                                )                                                                              }                                2                                                    ,                              T            min                    ≤          T          ≤                      T            max                                              (        2        )            
In Equation (2), x(i) denotes an audio signal of the i-th sample in the audio signal of samples in twice the maximum pitch cycle Tmax from the sample in the start position P. Further, Tmin denotes a minimum pitch cycle, which is a minimum value of the number of samples in a previously set pitch cycle.
The operation unit 14 performs weighted addition of the audio signal of samples in the pitch cycle T0 from the sample in the start position P in the arithmetic processing signals supplied from the processing buffer unit 12 and the audio signal of samples in the pitch cycle T0 from the sample in the position P+T0. The operation unit 14 supplies the resultant audio signal of the samples in the pitch cycle T0, as a compressed arithmetic processing signal, to the processing buffer unit 12.
The processing control unit 15 determines an initial start position P as a predetermined value (for example, 0). Further, the processing control unit 15 sequentially updates the start position P using the following Equations (3) and (4) based on the pitch cycle T0 supplied from the pitch calculation unit 13 and the playback speed conversion ratio R input from the outside. The processing control unit 15 supplies the start position P to the processing buffer unit 12.
                    P        =                  P          +                      Δ            ⁢                                                  ⁢            P                                              (        3        )                                          Δ          ⁢                                          ⁢          P                =                              T            0                    ×                      R                          R              -              1                                                          (        4        )            
Since a storage capacity of the processing buffer unit 12 is finite, the audio signal stored in the processing buffer unit 12 is updated at an appropriate timing. Accordingly, in this case, when the processing buffer unit 12 is a ring buffer, the processing control unit 15 updates the start position P using a modulo operation based on a length of the processing buffer unit 12. When the processing buffer unit 12 is not the ring buffer, the processing control unit 15 updates the start position P to be a sufficiently small value (for example, 0).
The accumulation unit 16 accumulates the audio signal of samples in the playback signal length L from the sample in the position P+T0, which is DMA-transferred from the processing buffer unit 12.
On the other hand, in operation or DMA transfer in a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or the like, there is a constraint on an arrangement of data as a processing target. It is assumed that a data amount of an audio signal of one sample is 32 bits (4 bytes). In this case, in order to perform, in parallel, operations in which an audio signal of 4 samples is a processing target, it may be necessary for the audio signal to be aligned to 16 bytes, a data amount for 4 samples. Further, in the DMA transfer, it may be necessary for a start position of a data transfer source or a transfer destination to be aligned to a default number of bytes, such as a power of 2.