Need for Information
In our society, people have a growing need for information and services. Most spectators attending to a live event (e.g., as auditors of a conference given by a speaker), or auditors attending to a live radio or television broadcast program, want to have access to complementary information. This complementary information may consist in the biography of a player of a football match, the historical background on events cited on a news program, or athletic records during the transmission of an Olympic competition.
In fact, today people are looking for more information about what they are hearing or listening locally, as spectators of live events, or remotely, as auditors of live broadcast programs:                Consumers want to have access to special services associated with advertised products.        Media providers expect new sources of profit by extending the quantity and quality of services and information provided to consumers and more particularly to auditors of live television or radio programs.        Advertisers are looking for new and more effective forms of advertisement.On-Line Services on the Web        
Independently of the massive development of radio and television, on-line services such as those provided on the World Wide Web (i.e., the Web), have rapidly emerged in our society and are now widely available. Such on-line services based on the Internet technology, provide access to a huge amount of information on an interactive basis. The Internet is a global network of computers. The Internet connects computers based on a variety of different operating systems or languages using a language referred to as TCP/IP (“Transmission Control Protocol/Internet Protocol”). With the increasing size and complexity of the Internet, tools have been developed to help users to find information they need on the network. These tools are often called “navigators” or “navigation systems”. The World Wide Web (“WWW” or “the Web”) is a recent superior navigation system. The Web is:                an Internet-based navigation system,        an information distribution and management system for the Internet, and        a dynamic format for communicating on the Web.        
Internet and the Web is transforming our society by offering to millions of users, the opportunity to access and exchange information and to communicate between each other. By integrating images, text, audio and video, a user on the Web using a graphical user interface can today transparently communicate with different computers on the system, different system applications, and different information formats for files and documents including, for example, text, sound and graphics. Currently, on-line systems on the Web offer a variety of different services to users, for instance, private message services, electronic commerce, news, real-time games, access to electronic databases, electronic newsletters, business-to-business transactions, or job placement services.
But, even if such on-line services are now available, the searching and finding of the relevant information on the Web remains an arduous task, sometimes taking hours, even for experienced users. Obviously, since the Web is essentially an open, multi-point to multi-point network, each user can select and retrieve different information from many different servers. In fact, today, most on-line interactions with the Web occur merely through textual inputs for instance by entering URLs (Uniform Resource Locator) names, by entering key words on search tools, or by activating textual hyperlinks on HTML (Hypertext Markup Language) documents. Even if in a near future, the development of audiovisual interfaces (e.g., human speech interfaces, Web-phone integration) will render textual inputs less and less dominant in on-line environments, there is a good chance that the Web remains user unfriendly due to its massiveness, its lack of organization, and its randomness. Simply stated, in the Web, there is no order or direction. Information remains most of the time hard to find and even worse, in a foreseeable future, it will remain a difficult task to find the required information into the desired context.
On-Line Services from Live Speech
Unlike the multi-point to multi-point Web network, a live speech to an audience (the audience being in the same location as the speaker or the audience being remotely located, i.e., accessed through a radio or television broadcast station) is primarily a communication from a single emitter to multiple receivers. Every auditor receives the same content, locally from the speaker, or remotely through the broadcasting station.
Thus, to provide on-line services similar to those that can be accessed on the Web, a first problem in a live speech is that the information flows continuously in the same direction, from a single source to multiple receivers, from a provider to multiple auditors. The communication flow is limited to one direction without any exchange of information with the auditors. People cannot directly interact with the oral information that is received, to access additional information or services.
Moreover, when people hear a live speech, a problem for the auditors is to select topics of interest and then to identify the network addresses (i.e., URLs) to access (e.g., from the Web) the multimedia information or services related with the selected topics. Until today, this problem has been partially solved.
To provide web-like capabilities to the oral or radio information, a solution is to embed information (e.g., URLs) into the transmitted broadcast audio signals or on separate channels (simulcast). Examples of such systems are described in the following patents: U.S. Pat. No. 6,125,172 entitled “Apparatus and method for initiating a transaction having acoustic data receiver that filters human voice”, U.S. Pat. No. 6,098,106 entitled “Method for controlling a computer with an audio signal”, U.S. Pat. No. 5,841,978 entitled “Network linking method using steganographically embedded data objects”, U.S. Pat. No. 5,832,223 entitled “System, method and device for automatic capture of Internet access information in a broadcast signal for use by an Internet access device”, U.S. Pat. No. 5,761,606, entitled “Media online services access via address embedded in video or audio program”, U.S. Pat. No. 5,189,630 entitled “Method for encoding and broadcasting information about live events using computer pattern matching techniques”, U.S. Pat. No. 5,119,507 entitled “Receiver apparatus and methods for identifying broadcast audio program selections in a radio broadcast system” or U.S. Pat. No. 6,061,719 entitled “Synchronized presentation of television programming and web content”.
The systems and methods described in these patents require the transmission of a complementary information (e.g., URLs) encoded, embedded or modulated on the same audio or video signal or transmitted on a separate channel, concurrently with the transmission of the main program. Radio or television stations must comprise means for encoding, modulating and transmitting along with the audio signal this complementary information. The radio auditors or television viewers must be equipped with special receivers and decoder circuits for recovering this information.
Independently of the herein above discussed arrangements, systems have been developed to enable auditors to “pre-select” topics of interest (i.e., keywords or sentences) and to associate these topics with pre-specified network addresses (i.e., URLs). These pre-specified network addresses are used to access multimedia information or services related with the pre-selected topics. In general terms, all these system are based on speech recognition techniques. These techniques are used to identify keywords (i.e., selected words or sentences) for performing specific actions in response to the recognition of specific sounds. Examples of these systems can be found in the following patents:
U.S. Pat. No. 5,946,050 entitled “Keyword listening device” discloses a method and a system for monitoring the audio portion of a broadcast signal by means of a keyword listening. device, where a relatively limited set of keywords are stored. The keyword listening device monitors the broadcast signal for any of these keywords. Upon recognition of any one or more of the keywords, the broadcast audio signal is recorded for a period of time and then fully analyzed. After analysis, and in dependence upon the recorded and analyzed broadcast audio signal, a number of different functions, such as connection to an external network at a specified address, or control of a video cassette recorder, may be performed.
U.S. Pat. No. 6,011,854 entitled “Automatic recognition of audio information in a broadcast program” discloses an audio processing system to search for information reports or updates (such as traffic, weather, time, sports, news, and the like) broadcast over one or several radio stations. The search is based on at least one keyword (such as “traffic”, “weather”, “time”, “sports”, “news” depending on the desired report) pre selected by the user, and entered into the audio processing system. While speech recognition software used by the audio processing system scans the radio station for the requested information report, the user may listen to other audio sources (a CD, a tape, another radio station broadcast, etc.) without being required to monitor (that is, listen to) the information content from those audio sources. Once the requested information report is detected based on the entered keyword used in the radio broadcast, the audio processing system switches its audio output to the radio station transmitting the desired broadcast, so that the user can timely and conveniently listen to the traffic, weather, time, sports, and/or news reports or updates.
U.S. Pat. No. 6,332,120 entitled “Broadcast speech recognition system for keyword monitoring” discloses a system where broadcast audio is automatically monitored for information of interest. This system comprises a computer processor with a memory for storing a vocabulary of keywords of interest, an audio receiver for receiving an audio broadcast and a speech recognition system, associated with the audio receiver and the computer processor, for detecting when one of the keywords appears in a received audio segment. A report generator associated with the computer processor and responsive to the detection of a keyword, generates a report with details related to the detected keyword and its context.
Even if the systems previously mentioned don't require the transmission of complementary information embedded with the audio signal (or on a secondary signal concurrently with the retransmission of the main program), the auditors must be equipped with receivers with speech recognition capabilities to detect the occurrence of hyperlinked terms in the data stream.
In the field of speech processing, the ability to identify occurrences of words or sentences in a stream of voice data is commonly called “word spotting”. The goal of audio wordspotting is to identify the boundaries of a search term within a digitized continuous speech stream without prior manual transcription. Searching and indexing a live speech that may be pronounced by any speaker, is particularly problematic. This is due in large part, to the limited capabilities of the existing automatic speech recognition technology. It is important to note that in the above discussed systems, the word spotting task is done on the auditor side, in a speaker independent manner, with unrestricted vocabulary, and employ speech models trained using voice data other than the data to recognize.
In fact, a fundamental problem with all systems, is the unreliable behavior of the state-of-the-art speech recognition technology for performing “word spotting” (i.e., identification of pre-specified keywords or terms) on a continuous manner, independently to the speaker, based on unknown or generic speaking styles, vocabularies, noise levels and language models.
As shown in the foregoing discussion, even if during these last years, interactive systems have been developed for increasing and improving the level of interaction with users and for providing more information and more learning or entertainment opportunities (e.g., interactive television, WebTV), important sources of information, such as those that can be found in the Web, still remain inaccessible for auditors of a live speech (e.g., a live conference or a live interview received from a radio or television broadcast).
Therefore, today there is a need to provide a convenient, universal, and easy mechanism for enabling people attending a live speech (e.g., or receiving a live broadcast program) to select and access complementary information.
There is also a need for speakers and producers of live broadcast programs to create hyperlinks from selected terms (generally selected spoken utterances, words or sentences) intended to be pronounced during a speech (e.g., on the course of a conference or during a live radio or television program), to relevant data on the Web without embedding these hyperlinks in conventional one-way broadcast signals, and more generally without physically transmitting these hyperlinks and without modifying conventional transmitters or receivers.