The present invention relates to a technique for processing the speech of a particular speaker. Specifically, the present invention relates to a technique for enhancing, reducing, or cancelling the speech of a particular speaker.
In everyday life, people sometimes do not want to hear only the speech of a particular speaker. In the following cases, for example, such filtering of the speech of a particular speaker is desirable:
the voice of a loud person in a public transport, such as a train, a bus, or an airplane;
the voice of a loud person in a hotel, a museum, or an aquarium; and
the voice of a person in a loudspeaker car or a campaign car.
An example of a mechanism for cancelling ambient sounds (also referred to as environmental sounds) includes electronic equipment with a noise canceller, such as a headphone or portable music player with a noise canceller. The electronic equipment with a noise canceller collects ambient sounds with a built-in microphone, mixes a signal in opposite phase thereto with an audio signal, and outputs it to reduce the ambient sounds that enter the electronic equipment from the outside.
Other methods for cancelling ambient sounds include a method for blocking sounds with earplugs and a method for cancelling the noise by listening to a large volume of music with a headphone or earphone.
Japanese Unexamined Patent Application Publication No. 2007-187748 discloses a sound selection and processing unit that selectively removes sounds that a user feels uncomfortable from a sound mixture generated around the user. The unit includes sound separation means for separating the sound mixture into sounds from individual sound sources, uncomfortable-feeling detection means for detecting that the user is in an uncomfortable state, and candidate-sound selection and determination means that operates, when it is determined by the uncomfortable-feeling detection means that the user is in an uncomfortable-feeling state, to evaluate the relationship between the separated sounds and estimating a separated sound candidate to be processed on the basis of the evaluation result. The sound selection and processing unit further comprises candidate-sound presentation and identifying means for presenting the estimated separated sound candidate to be processed to the user, receiving the selection, and identifying the selected separated sound. Moreover, the sound selection and processing unit further comprises sound processing means for processing the identified separated sound to reconstruct a sound mixture.
Japanese Unexamined Patent Application Publication No. 2008-87140 discloses a speech recognition robot capable of responding to a speaker in a state in which it faces the speaker all the time and a method for controlling the speech recognition robot.
Japanese Unexamined Patent Application Publication No. 2004-133403 discloses a speech-signal processing unit that extracts effective speech in which conversation is held under an environment in which a plurality of speech signals from a plurality of speech sources are mixedly input.
Japanese Unexamined Patent Application Publication (Translation of PCT Application) No. 10-512686 discloses a speaker adaptive speech recognition system including feature extraction means for converting speech signals from a speaker to a feature-vector dataset.
Japanese Unexamined Patent Application Publication No. 2003-198719 discloses a headset capable of selectively changing the ratio of an external direct sound to a sound transmitted via a communication system to smooth speech communications and speech commands using a short-range wireless communication headset, as well as a communication system using the same.
Japanese Unexamined Patent Application Publication No. 8-163255 discloses speech recognition based on a speaker adaptive system without a burden on a speaker in a telephone-answering system.
Japanese Unexamined Patent Application Publication No. 2012-98483 discloses a speech-data generation unit that includes input means for inputting the speech of a speaker and conversion means for converting the speech input from the input means to text data, and that generates speech data about the speech of the speaker for masking the speech.
Japanese Unexamined Patent Application Publication No. 2005-215888 discloses a display unit for text sentences, such as a character string and a comment for communications and transmissions, capable of conveying the content, feeling, or mood more deeply.