The present invention relates to a personalized system for providing a service for improving understandability of received speech in accordance with user specific needs. The said system is online and used by a plurality of users, addressing the user""s inability to understand speech.
The existing solutions are all in the form of an equipment or device that can be used only by one person. The problem with such individual-use devices is that it is not feasible and practical for each such individual device to stay continuously upgraded with the latest advancements in technology or to dynamically customize with the changes in the user""s acoustic profile, usage environment and conversation context. There are multiple reasons for this. It is also not always possible to customize an off-the-shelf equipment for an individual""s disability and needs. Also the latest technological advancements and algorithms are likely to be expensive for incorporation in an individual device, thereby limiting its quality of service. A device like this is usually required to be used for a long period of time, in some cases for the lifetime of the individual. It is not easy for a device to adjust and customize dynamically to the changes in an individuals disability over a period of time, without requiring a repurchase. It is also not possible to make use of the specific conversation context or environment to achieve better results. E.g. the user could be using the device in a plurality of business contexts, in social setting or at home during the day. It is not easy to customize an individuals device at such fine granularity level.
Some systems have been proposed that address other aspects of speech understanding. For example U.S. Pat. No. 6,036,496 describes an apparatus and method for screening an individual""s ability to process acoustic events. The invention provides sequences (or trials) of acoustically processed target and distracter phoneme to a subject for identification. The acoustic processing includes amplitude emphasis of selected frequency envelopes, stretching (in the time domain) of selected portions of phoneme, and phase adjustment of selection portions of phoneme relative to a base frequency. After a number of trials, the invention develops a profile for an individual that indicates whether the individual""s ability to process acoustic events is within a normal range, and if not, what processing can provide the individual with optimal hearing. The invention provides a method to determine an individual""s acoustic profile. This is better than the typical hearing tests, which determine whether an individual can hear particular frequencies, at particular amplitudes. The invention also mentions that the individual""s profile can then be used by a listening or processing device to particularly emphasize, stretch, or otherwise manipulate an audio stream to provide the individual with an optimal chance of distinguishing between similar acoustic events.
Another U.S. Pat. No. 6,071,123 proposes a method and a system that provides means to enable individuals with speech, language and reading based communication disabilities, due to a temporal processing problem, to improve their temporal processing abilities as well as their communication abilities. The method and system include provisions to elongate portions of phoneme that have brief and/or rapidly changing acoustic spectra, such as occur in the stop consonants b and d in the phonemes /ba/ and /da/, as well as reduce the duration of the steady state portion of the syllable. In addition, some emphasis is added to the rapidly changing segments of these phonemes. Additionally, the disclosure includes method for and computer software to modify fluent speech to make the modified speech better recognizable by communicatively impaired individuals. The proposed apparatus is a device or an equipment to be used by an individual.
U.S. Pat. No. 6,109,107 provides an improved method and apparatus for the identification and treatment of language perception problems in specific language impaired (SLI) individuals. The invention provides a method and apparatus for screening individuals for SLI and training individuals who suffer from SLI to re-mediate the effects of the impairment by using the spectral content of interfering sound stimuli and the temporal ordering or direction of the interference between the stimuli. This emphasis in this invention is on screening and training individuals and not providing a device or a service to address the disability.
U.S. Pat. No. 5,839,109 also describes a speech recognition apparatus that includes a sound pickup, a standard feature storage device, a comparing device, a display pattern storing device, and a display. The apparatus can display non-speech sounds either as a message or as an image, and is especially useful for hearing-impaired individuals. For example, if a fire engine siren is detected, the display can show a picture of a fire engine, or can display the message xe2x80x9csiren is soundingxe2x80x9d.
All of the above solutions are limited to addressing hearing disabilities and are not directed at improving the understandability of speech which is an issue that could occur even with individuals without hearing disabilities. For example aspects relating to spoken accent or as an extreme case, a different language are not addressed by any of the above solutions.
In addition, even for cases where physical disability is involved, none of the above solutions addresses those situations where extreme disabilities occurxe2x80x94for Example, complete loss of hearing or complete loss of hearing coupled with blindness.
The existing solutions are also non-adaptive as they do not automatically adjust to dynamically varying individual requirements-eg. Ambient noise levels, change in hearing patterns etc., nor are they capable of automatically adapting to different user profiles, as a result it is not feasible for multiple users to use the same system.
The object of this invention is to obviate the above drawbacks and to provide personalized improved understandability of speech based on an individual""s needs.
The second object of this invention is to display the speech in text or as graphics on a display panel on the phone device instead of being an audio heard through the phone speaker.
Another object of this invention is to provide data processing functionality as a third party service to a plurality of users, over a network, such as an Intranet, an Extranet or an Internet.
Yet another object of this invention is to provide a self learning system using artificial intelligence and expert system techniques.
Another object of this invention is to provide a speech-enabled WAP (Wireless Application Protocol) system for hearing or speech.
To achieve the said objective this invention provides a personalized system for providing a service for improving understandability of received speech in accordance with user specific needs characterized in that it includes:
input interface means for capturing received speech signals connected to a speech recognition or speech signal analysis means for identifying the contents of the received speech connected to one input of a data processing means for performing improvement in understandability,
a user profile storage means connected to another input of said data processing means for providing user specific improvement data, and
an output generation means connected to the output of said data processing means to produce personalized output based on an individual""s needs.
The said personalized system is online.
The said speech recognition means is any known speech recognition means.
The said data processing means is a computing system.
The said data processing means is a server system in a client server environment.
The said data processing means is a self-learning system using artificial intelligence or expert system techniques, which improves its performance based on feedback from the users over a period of time and also dynamically updates the users current profiles.
The said speech recognition means, speech signal analysis means, data processing means and output generation means individually or collectively improve performance automatically with time, use, improvement in technology, enhancement in design or changes in user profile and provides the improved service without the need to make any changes to the user equipment.
The said output generation means is a means for generating speech from the electrical signal received from said data processing means.
The said output generation means is a display means for generating visual output for the user.
The said output generation means is a vibro-tactile device for generating output for the user in tactile form.
The above system further includes means for the user to register with said system.
The said data processing means includes means to perform the understandability improvement with reference to the context of the received speech.
The said data processing means includes means to translate the received speech from one language to another.
The said data processing means includes means for computing the data partially on the client and partially on the server.
The said data processing means includes the means for the user to specify or modify the stored individual profile.
The user identifies himself by a userid at the beginning of each transaction.
The said data processing means includes a default profile means in the absence of specific user profiles.
The system allows the user to specify a usage environment or conversation context at the beginning of each transaction.
The data processing means includes use of a specified context to limit the vocabulary for speech recognition and enhance system performance.
The data processing means includes means for sending advertisement to the user in between or after the outputs.
The said input interface means and/or output generation means are speech enabled wireless application protocol devices.
The said output generation means supports a graphical display interface.
The said input interface is a microphone of a regular telephone device, land line or mobile and the output generation means is a speaker of said phone device, the speaker is meant only for single user and the microphone is meant for the user""s surroundings.
The said output generation means is a speaker of a telephone device, which could be plugged in the user""s ears using a wire or wireless medium namely, Bluetooth.
The said output generation means is a display panel on a watch strap connected to the phone device through a wire or wireless medium.
The said input interface means captures the speech from the users environment and provides a feedback to the user after improving understandability.
The said input interface means is a microphone of a regular telephone device, land line or mobile.
The said output generation means automatically tracks the conversational context using already known techniques and multimedia devices.
The input interface receives speech input from more than one source and provides improved understandability for all the received speech signals in accordance with the user profile.
The above system further comprises pricing mechanism which is based on the quality of service and on fixed amount per unit time of use or variable amount per time of use or down payment for certain period of use or combination of down payment and pay per use or combination of down payment and unit time of use including period for free use.
The present invention further provides a personalized method for providing a service for improving understandability of received speech in accordance with user specific needs characterized in that it includes:
capturing received speech signals,
identifying the contents of said received speech through speech recognition or speech signal analysis,
processing the data for performing improvement in understandability,
providing user specific improvement data by a user profile storage, and
generating personalized output based on an individual""s needs.
The said method is executed online.
The speech recognition is by any known speech recognition methods.
The said processing of data is done by computation.
The said processing of data is done by a server in a client server environment.
The said processing of data is done by a self-leaning using artificial intelligence or expert method technique, which improves its performance based on feedback from the users over a period of time and also dynamically updates the user""s current profiles.
The said speech recognition, speech signal analysis, data processing and output generation individually or collectively improve performance automatically with time, use, improvement in technology, enhancement in design or changes in user profile and provides the improved service without the need to make any changes to the user equipment.
The said generation of personalized output is by generating speech from the electrical signal received from said processing of data.
The said generation of personalized output is displayed for generating visual output for the user.
The said generation of personalized output is in a vibro-tactile form for generating output for the user in tactile form.
The above method further includes registering of the user with said method.
The said processing of data includes performing the understandability improvement with reference to the context of the received speech.
The said processing of data includes translation of the received speech from one language to another.
The said processing of data includes computing the data partially on the client and partially on the server.
The said processing of data includes specifying or modifying the stored individual profile for the user.
The user identifies himself by a userid at the beginning of each transaction.
The said processing of data includes a default profile in the absence of specific user profiles.
The method allows the user to specify a usage environment or conversation context at the beginning of each transaction.
The said processing of data includes use of a specified context to limit the vocabulary for speech recognition and enhance system performance.
The said processing of data includes sending advertisement to the user in between or after the outputs.
The said capturing of received speech signals and/or generation of personalized output is by use of speech enabled wireless application protocol methods.
The said generation of personalized output supports a graphical display interface.
The received speech signals are captured through a microphone of a regular telephone device, land line or mobile and the output is generated through a speaker of said phone device, the speaker is meant only for single user and the microphone is meant for the user""s surroundings.
The said generation of personalized output is through a speaker of a telephone device, which could be plugged in the user""s ears using a wire or wireless medium namely, Bluetooth.
The said generation of personalized output is through a display panel on a watch strap connected to the phone device through a wire or wireless medium.
The above method further includes capturing the speech from the user""s environment and providing a feedback to the user after improving understandability.
The said generation of personalized output includes automatic tracking of the conversational context using already known techniques and multimedia devices.
The speech input is received from more than one source and improved understandability for all the received speech signals is provided in accordance with the user profile.
The above method further comprises pricing, which is based on the quality of service and on fixed amount per unit time of use or variable amount per time of use or down payment for certain period of use or combination of down payment and pay per use or combination of down payment and unit time of use including period for free use.
The instant invention further provides a personalized computer program product comprising computer readable program code stored on computer readable storage medium embodied therein for providing a service for improving understandability of received speech in accordance with user specific needs comprising:
computer readable program code means configured for capturing received speech signals,
computer readable program code means configured for identifying the contents of said received speech through speech recognition or speech signal analysis,
computer readable program code means configured for processing the data for performing improvement in understandability,
computer readable program code means configured for providing user specific improvement data by a user profile storage, and
computer readable program code means configured for generating personalized output based on an individual""s needs.
The said personalized computer program product is online.
The speech recognition is performed by computer readable program code devices using any known speech recognition techniques.
The said computer readable program code means configured for processing of data is a computing system.
The said computer readable program code means configured for processing of data is a server system in a client server environment.
The said computer readable program code means configured for processing of data is a self-learning system using artificial intelligence or expert method technique, which improves its performance based on feedback from the users over a period of time and also dynamically updates the user""s current profiles.
The said computer readable program code means configured for speech recognition, speech signal analysis means, data processing and output generation individually or collectively improve performance automatically with time, use, improvement in technology, enhancement in design or changes in user profile and provides the improved service without the need to make any changes to the user equipment.
The said computer readable program code means for generating output is configured to generate personalized output for the user in display form.
The said computer readable program code means configured for generating output is configured for generating personalized output for the user in vibro-tactile form.
The above computer program product further includes computer readable program code means configured for the user to register with said computer program product.
The said computer readable program code means configured for processing of data performs the understandability improvement with reference to the context of the received speech.
The said computer readable program code means configured for processing of data translates the received speech from one language to another.
The said computer readable program code means configured for processing of data computes the data partially on the client and partially on the server.
The said computer readable program code means configured for processing of data specifies or modifies the stored individual profile for the user.
The user identifies himself by a userid at the beginning of each transaction.
The said computer readable program code means configured for processing of data includes a default profile in the absence of specific user profiles.
The computer program product allows the user to specify a usage environment or conversation context at the beginning of each transaction.
The said computer readable program code means configured for processing of data uses a specified context to limit the vocabulary for speech recognition and enhance system performance.
The said computer readable program code means configured for processing of data sends advertisement to the user in between or after the outputs.
The said computer readable program code means configured for capturing received speech signals and/or generation of personalized output is by use of speech enabled wireless application protocol methods.
The said computer readable program code means configured for generating personalized output supports a graphical display interface.
The said computer readable program code means configured for capturing received speech signals is a microphone of a regular telephone device, land line or mobile and the computer readable program code means configured for generating output is a speaker of said phone device, the speaker is meant only for single user and the microphone is meant for the user""s surroundings.
The said computer readable program code means configured for generating personalized output is through a speaker of a telephone device, which could be plugged in the user""s ears using a wire or wireless medium namely, Bluetooth.
The said computer readable program code means configured for generating personalized output is through a display panel on a watch strap connected to the phone device through a wire or wireless medium.
The said computer readable program code means configured for generating personalized output includes tracking conversational text automatically using already known techniques and multimedia devices.
The computer readable program code means configured for capturing received speech signals receives speech input from more than one source and provides improved understandability for all the received speech signals in accordance with the user profile.
The above computer program product further comprises computer readable program code means configured for pricing, which is based on the quality of service and on fixed amount per unit time of use or variable amount per time of use or down payment for certain period of use or combination of down payment and pay per use or combination of down payment and unit time of use including period for free use.