According to a first conventional technology, pitch cycle extraction is performed on an input sound signal (a mixed sound) and, when a pitch cycle is not extracted, the sound is determined as noise (see Patent Reference 1, for example). Using the first conventional technology, the sound is recognized from the input sound determined as a sound candidate.
FIG. 1 is a block diagram showing a configuration of a noise elimination device related to the first conventional technology described in Patent Reference 1.
This noise elimination device includes a recognition unit 2501, a pitch extraction unit 2502, a determination unit 2503, and a cycle duration storage unit 2504.
The recognition unit 2501 is a processing unit which provides outputs of sound recognition candidates of a signal segment presumed to be a sound part (a to-be-extracted sound) from an input sound signal (a mixed sound). The pitch extraction unit 2502 is a processing unit which extracts a pitch cycle from the input sound signal. The determination unit 2503 is a processing unit which provides an output of a sound recognition result based on: the sound recognition candidates of the signal segment given by the recognition unit 2501; and the result of the pitch extraction performed on the signal segment by the pitch extraction unit 2502. The cycle duration storage unit 2504 is a storage device which stores a cycle duration of the pitch cycle extracted by the pitch extraction unit 2502. Using this noise elimination device, when a pitch cycle is within a predetermined cycle set with respect to the pitch cycle, the signal of the present signal segment is determined as a sound candidate. Meanwhile, when the pitch cycle is outside the predetermined cycle set with respect to the pitch cycle, the signal is determined as noise.
According to a second conventional technology, the presence or absence of an input of a human voice is eventually determined on the basis of determination results given by three determination units (see Patent Reference 2, for example). A first determination unit determines that a human voice (a to-be-extracted sound) is received, when a signal component having a harmonic structure is detected from an input signal (a mixed sound). A second determination unit determines that a human voice is received, when a centroid frequency of the input signal is within a predetermined frequency range. A third determination unit determines that a human voice is received, when a power ratio of the input signal with respect to a noise level stored in a noise level storage unit exceeds a predetermined threshold value.    Patent Reference 1: Japanese Unexamined Patent Application Publication No. 05-210397 (claim 2, FIG. 1)    Patent Reference 2: Japanese Unexamined Patent Application Publication No. 2006-194959 (claim 1)