1. Field of the Invention
The present invention relates to a sound processing device, a sound processing method, and a program, and more particularly, to a sound processing device, a sound processing method, and a program that perform sound separation and noise elimination by using an independent component analysis (ICA).
2. Description of the Related Art
Recently, there is a technology of separating a signal transmitted from one or more sound sources from mixed sounds including sounds transmitted from a plurality of sound sources by using a BBS (Blind Source Separation) method that is based on an ICA (Independent Component Analysis) method. For example, in order to reduce the remaining noise that is difficult to be eliminated by sound source separation using the ICA, a technology using an nonlinear process after the sound source separation using the ICA is disclosed (for example, Japanese Unexamined Patent Application Publication No. 2006-154314).
However, a case where the non-linear process is performed after the ICA process is premised on the separation process using the ICA being performed well at the former stage. Accordingly, in a case where it is difficult to achieve sound source separation to some degree in the separation process using the ICA, there is a problem where it is difficult to expect sufficient performance improvement by performing the nonlinear process at the latter stage.
Thus, a technology of performing a nonlinear process at a stage prior to the sound source separation using the ICA is disclosed (for example, Japanese Patent No. 3,949,150). According to Japanese Patent No. 3,949,150, even in a case where the number N of signal sources and the number M of sensors are in a relationship of N>M, mixed signals can be separated with high quality. In the sound source separation using the ICA, in order extract each signal with high precision, it is necessary that M≧N. Thus, in Japanese Patent No. 3,949,150, assuming that N sound sources do not simultaneously exist, time-frequency components that include only V (V≦M) sound sources are extracted from an observed signal in which N sound sources are mixed by performing binary masking or the like. Then, by applying the ICA or the like for the limited time-frequency component, each sound source can be extracted.