The present invention relates to apparatus and methods for adding emotive background sound to a text sequence which can be derived from speech or text containing words in a digital data file or in a data stream.
During reading of text or listening to a conversation, it can be pleasurable to listen to a background sound such as music or environment sounds such as sounds heard at a beach or in a forest. Background sound should match the emotive content of the text read or speech heard.
It is well known to convert text to speech. U.S. Pat. No. 4,429,182 discloses a numeric calculator that generates synthetic speech as numbers are entered and displays entered data. U.S. Pat. No. 4,701,862 discloses an alarm system having speech output in response to digitized time signals. The clock will display a given time with an associated audio recording. U.S. Pat. No. 4,829,580 discloses alpha-numeric character data are used to create sound output based on look-up tables. The tables modify phonic data to assign pause, stress, duration, pitch and intensity to create improved synthetic speech. Intoned speech from text is not an emotive background sound.
Music can be comprised of a melody and lyrics. The melody provides an emotive background for the words of the lyrics. The melody, which is a sound pattern, and the lyrics, which can be tonal (sung) lyrics or atonal speech (rap) paced by the melody, can be stored as a combined digital record. The digital record links the text to the sound sequence. The lyrics associated with the music are chosen by an author to match the melody in a specific sequence. Music can be re-pitched or the pace changed, however the coupling between the lyrics and melody is fixed by an author.
Music can be stored digitally in such formats as the Level 1 General MIDI. In the MIDI format, musical data is digitally represented as multiple sound tracks, each track capable of voicing an assigned instrument. In addition, music in a MIDI file can be modified with parameters such as expression, sustain and pitch bend. U.S. Pat. No. 5,792,972 discloses user control during the playback of a MIDI file. Voices and pitch can be changed in response to user input. Pacing between the words and melody must be maintained when playing music.
Combinations of text and music are used in karaoke systems. Music is generated from a source and images are displayed in response to the music. In association with the music, the text of the lyrics is displayed in association with the progress of the melody so that a singer can read the words associated with the music. U.S. Pat. No. 5,410,097 discloses apparatus playing music and text and having control means to move between passages of multimedia segments. Presentation of text and melody is fixed by the author. The system always maintains an association between words and melody.
Hypermedia consists of coupled data files that can incorporate still images, video images, text and music. A hypermedia file can consist of a set of text including xe2x80x9cmetadataxe2x80x9d that associates other digital data with the text. Metadata can include parameters describing attributes of display of characters, such as xe2x80x9cboldxe2x80x9d, xe2x80x9cunderlinexe2x80x9d, etc. Metadata can also incorporate vectors associating audio, text and/or video records. The association between various medias must be created by an author. U.S. Pat. No. 5,596,695 discloses a plurality of data types that are coupled together as metadata. Prior art in the field of hypermedia does not disclose methods of generating an emotive background sound from a text.
Prior art discloses books with touch or pointer responsive noise generation associated with portions of text. The noises can have emotive content. U.S. Pat. No. 5,645,432 discloses a book having touch-activated areas for generating sound within the body of text. U.S. Pat. No. 3,724,100 discloses such a book having text, and a positional transducer for producing sound recorded on a page associated with said text. An author is required to generate an association between the sound and text. A reader must manually queue the associated sound environment. It would be useful to have an emotive background sound generated automatically in response to key words in the text.
Apparatus has been disclosed that generates an acoustic background while reading text to improve reading. U.S. Pat. No. 5,033,966 provides a cyclic stereophonic sound pattern that is heard while reading text. The side-to-side change in apparent sound direction encourages a rapid sweeping of eye focus during text reading. The background is not responsive to the emotive content of the text. U.S. Pat. No. 5,061,185 delivers audio signals to each ear, a first audio signal having subliminal messages and an audio signal exclusive of the subliminal messages. The background is not coupled to the textual semantic or emotive content.
Prior art discloses means for displaying combinations of text, images and sound. Presentation of text and images is paced by a melody. Hypertext structures require user interaction to initiate processing of media files. Other means have been disclosed which require a user to selectively trigger media events that can incorporate video, audio, still image and text presentation. Crafting interactions between the various media structures has been done manually by an author. It would be advantageous for a person to be reading or hearing words and have an emotive background sound automatically playing in response to key words in the text at a user selected pace. It would be advantageous for such responsive background sounds to occur when reading or speaking or coupled with transmitted speech.
Music is a pre-scripted coupling of non-vocal sound and lyrics. The separate elements must be pre-scripted by an author. There is no way to decouple the music from a word stream. Simply playing music or an environmental recording while reading or listening to speech provides environmental sound, however, the environmental sound is not responsive to the emotive content of the text.
It would be advantageous to have an emotive background sound provided with key words in a text.
The present invention is directed to a system for providing emotive background sound to a given text comprising a source of text that is to be provided with an emotive background and that includes key words that serve to indicate the background that is appropriate, a store of such keys words including parameters to provide an emotive background sound appropriate to the key words; and a process to recognize key words within said text using said store to provide an emotive background sound appropriate to key words. The emotive background environment is generated from text that can be displayed on a monitor or from spoken text.
Viewed from a first apparatus aspect, the present invention is directed to apparatus comprising memory apparatus and a processor. The memory apparatus has stored therein key words and/or key nontextual indicia and emotive sounds. Each key word and/or nontextual indicia is associated with one of the emotive sounds. The processor is to detect words and nontextual indicia received thereby and to compare the received words and nontextual indicia to key words and nontextual indicia stored in the memory. The processor is adapted to generate an output signal representing an emotive sound associated with a key word and/or key nontextual indicia if there is a match with a received word and/or nontextual indicia.
Viewed from a second apparatus aspect, the present invention is directed to a system for providing an emotive background sound to a text that includes key words and/or key nontextual indicia. The system comprises first and second memories, and a microprocessor. The first memory has stored therein key words and/or key nontextual indicia. The second memory has stored therein emotive sounds with each key word and each key nontextual indicia having associated therewith an emotive sound. The microprocessor is in communication with the memories and being adapted to compare received words and nontextual indicia of the text with the key words and key nontextual indicia stored in the first memory, and if there is a match between one of more key words and the key nontextual indicia, to generate at an output thereof a signal representative of the emotive sound associated with the matched key word or nontextual indicia.
Viewed from a third apparatus aspect, the present invention is directed to a system for providing an emotive background sound to a text that includes key words and/or nontextual indicia. The system comprises means for recognizing words and/or nontextual indicia in the text, a first file containing key words and key nontextual indicia, a second file containing emotive sounds with an emotive sound being associated with a key word and/or key nontextual indicia, and means for comparing the words and/or nontextual indicia with those contained in the first file, and if there is found to be a match, for causing an emotive sound to be generated which corresponds to the matched key word and/or key nontextual indicia.
Viewed from a fourth apparatus aspect, the present invention is directed to a system for providing an emotive background sound to a text that includes key words and/or key nontextual indicia. The system comprises a processor system that includes a processor, first and second memory sections, and sound generating circuitry. The first memory section stores key words and key nontextual indicia. The second memory stores emotive sounds with each emotive sound corresponding to a key word or key nontextual indicia stored in the first memory section. The processor is adapted to sense words and nontextual indicia in the text and to compare same to the key words and the key nontextual indicia stored in the first memory section and to cause the sound generating circuitry to generate an emotive sound if there is a match of a key word and/or key non-text indicia of the text with one in the first memory.
Viewed from a fifth apparatus aspect, the present invention is directed to a system for providing an emotive background sound to a text that includes key words and/or key nontextual indicia. The system comprises a store of key words and key nontextual indicia, and a store of background sounds that is controlled by the key words or key nontextual indicia whereby as key words and/or nontextual indicia of the text are recognized. The store of background sounds provides an emotive background sound appropriate to the key word or nontextual indicia.
Viewed from a sixth apparatus aspect, the present invention is directed to a system which generates a background emotive sound in response to speech which contains words and sounds. The system comprises a microphone, an analog-to-digital converter; a microprocessor in communication with the analog-to-digital converter; a first memory having stored therein key words and key sounds; and a second memory having stored therein emotive sounds with each key word and each key sound having associated therewith an emotive sound. The microphone is adapted to be in communication with the speech and generates an analog electrical representative of the speech in communication therewith. The analog-to-digital converter is in communication with the microphone and converts an electrical signal representative of speech in communication with the microphone into a digital format. The microprocessor is in communication with the analog-to-digital converter. The microprocessor is in communication with the memories and is adapted to compare received words and sounds of the speech with the key words and key sounds stored in the first memory, and if there is a match between one or more of the key words and/or the key sounds, to generate at an output thereof a signal representative of the emotive sound associated with the matched key word or key sound.
Viewed from a seventh apparatus aspect, the present invention is directed to a system which, in response to digitized words and sounds, generates speech and a background emotive sound. The system comprises first and second microprocessors, a first memory having stored therein key words and key sounds, a second memory having stored therein emotive sounds with each key word and each key sound having associated therewith an emotive sound, first and second sound drivers, and a speaker. The first microprocessor is adapted to receive the digitized words and sounds and in communication with the memories and being adapted to compare the digitized words and sounds with the key words and key sounds stored in the first memory, and if there is a match between one or more of the key words and/or the key sounds, to generate at an output thereof a signal representative of the emotive sound associated with the matched key word or key sound. The second microprocessor is adapted to receive the digitized words and sounds and to generate at an output thereof a signal representative of the words and sounds. The first sound driver is in communication with the first microprocessor. The second sound driver is in communication with the second microprocessor. The sound mixer is in communication with the first and second sound drivers. The speaker is communication with the sound mixer.
Viewed from an eighth apparatus aspect, the present invention is directed to an emotive dictionary useful for providing emotive background sound to either a text in digitized form or speech comprising a memory having stored therein key words and/or equivalent and associated data corresponding to an emotive sound.
Viewed from a ninth process aspect, the present invention is directed to a method of providing an emotive background sound to a given text which includes at least one key word and/or key nontextual indicia. The method comprises the steps of: providing a source of key words and/or key nontextual indicia with an emotive sound appropriate to each key word or key nontextual indicia being associated with each key word or nontextual indicia; sensing the text to determine if a key word or key nontextual indicia is present therein; and generating an emotive sound associated with a key word and/or key nontextual indicia found in the text.
Viewed from a tenth process aspect, the present invention is directed to a method of providing an emotive background sound to speech which includes at least one key word and/or key sound. The method comprises the steps of: sensing the words and sounds contained within the speech; comparing the words and sounds contained within the speech to key words and key sounds stored within a file with each key word or key sound contained within the file having an emotive sound appropriate thereto associated therewith; and generating an emotive sound appropriate to a key word and/or key sound found within the speech such that the emotive sound occurs during the speech.
Viewed from an eleventh process aspect, the present invention is directed to a method of providing an emotive background sound to a given text which includes at least one key word and/or key nontextual indicia. The method comprises the steps of: providing a source of key words and/or key nontextual indicia with an emotive sound appropriate to each key word or key nontextual indicia being associated with each key word or nontextual indicia; sensing the text to determine if a key word or key nontextual indicia is present therein; and generating a file containing said text and emotive sound parameters associated with a key word and/or key nontextual indicia found in the text.
The invention will be more readily understood from the following detailed description taken with the accompanying drawings and claims.