This patent application is related to commonly-assigned U.S. patent application Ser. No. 09/627,555, filed Jul. 28, 2000, to Bolle et al., entitled xe2x80x9cApparatus, System and Method for Augmenting Video Information Streams with Relevant Informationxe2x80x9d, the disclosure of which is incorporated by reference herein in its entirety.
This invention relates generally knowledge management methods and apparatus and, more specifically, the invention relates to knowledge management of information streams to determine knowledge concepts present in a content of an information stream and to determine additional or collateral information that is related to the content of the information stream.
An information stream is a source of information where the information has a time-based component, and where the information xe2x80x9cflowsxe2x80x9d from a source to a destination. The most common example of an information stream is spoken discourse (i.e., speech). The speaker is the information source, the listener is the destination, the content of the speech (the actual words) contains or represents the information, and the audible sound pressure wave produced by the speaker""s mouth transmits the information from the speaker to the listener. The sound wave travels over time and must be processed in real-time (i.e., heard) by the listener. If the listener does not process the sound wave as it is received, the speech will be lost and the listener will not receive the information.
Other kinds of information streams include, for example, television broadcasts, telephone conversations, and computer network-based communications. An important feature of an information stream is that the information is transmitted over time and must be processed in real-time as it is received. Of course, this processing may include capture of the information (e.g., into a computer file) for further processing off-line at a later date.
Information streams are a valuable resource in the practice of knowledge management. Knowledge management is an activity that includes processes and technologies for capturing intellectual capital and making it easily accessible for reuse and exploitation (see, for example, Davenport and Prusak, xe2x80x9cWorking Knowledgexe2x80x9d, Harvard Business School Press, Boston, 1998).
Many knowledge management tools exist that operate on textual information, or documents. The most basic operation is to index and search the documents using a text retrieval system (see, for example, Baeza-Yates and Ribeiro-Neto, xe2x80x9cModem Information Retrievalxe2x80x9d, ACM Press, New York, 1999). More advanced operations on documents include automatic clustering, automatic classification, and automatic extraction of concepts and named entities from documents. One product that provides tools to perform all of these tasks on a collection of documents is the IBM Intelligent Miner for Text (see U.S. Pat. No. 5,832,480).
All of these previously described document processing tasks may be further refined with user profiles. A user profile describes a particular interest or set of interests on behalf of the user. The profile is used to filter or modify the various document processing tasks so that the results more closely match the interests of the end user.
The convergence of information streams and knowledge management occurs naturally in two important contexts: meetings and data broadcasting. Meetings have a variety of incarnations, with the most common being a face-to-face meeting between two or more individuals. The meeting will minimally include a spoken discourse information stream, and may additionally include other documents, such as an agenda, a visual presentation, and notes (i.e., meeting minutes). Other incarnations of meetings include sales presentations, teleconferences, video conferences, email exchanges, chat sessions, and help desk call sessions. For prior art related to meetings, see U.S. Pat. Nos. 5,890,131, 5,786,814, 6,018,346 and 5,465,370.
Data broadcasting is the process of encoding data in a television broadcast signal (in addition to the traditional video and audio signals). Both analog and newer digital television channels have unused bandwidth that can be used to transmit arbitrary data. This data may or may not be related to the accompanying audio/video broadcast. With the incorporation of data broadcasting, a television broadcast signal becomes a very rich information stream comprising audio, video, and data. For prior art related to data broadcasting, see U.S. Pat. Nos. 5,887,062 and 6,031,578.
The emergence of the World Wide Web (WWW or simply Web) as an information and entertainment media is generating many changes in the more traditional media of broadcast television. In particular, broadcasters have begun to link these two media together to create a much richer television viewing experience. For example, television programs may display URLs that point to Web sites related to the program. A next phase of linkage will enable set top boxes and TV tuner computer cards to become more prevalent. Such devices will allow broadcasters to send Web content with the television broadcast and display the audio/video program in an integrated fashion with the Web content.
This tighter integration of broadcast television and the Web presents a number of challenges, with one of the more difficult challenges being how to identify the information that should be broadcast with the television program. Currently, program producers manually identify the information to be broadcast. This process may be supported by software that aids in scheduling the data broadcast, or software that automatically accesses databases to obtain, for example, stock quotes. Nevertheless, the overall information seeking and selection process is manual.
This approach has several disadvantages. First, it is slow and expensive. Second, there is no mechanism to tie additional information into a live broadcast, where the time at which a particular topic is discussed is not known beforehand. Currently, if a significant event (e.g., a natural disaster occurs during a broadcast of the daily news), the producers have a difficult time just reporting the event, and in general may have no time to find background information. Third, with the advent of set top boxes, users may wish to customize the information displayed on their TV set. For example, one person may wish to see only sports-related information, while another may wish to choose news that is related to a specific geographic location.
One problem of particular interest to the teachings of this invention is most closely related to efforts related to Topic Detection and Tracking (TDT). Reference in this regard can be had to J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang, xe2x80x9cTopic Detection and Tracking Pilot Study: Final Reportxe2x80x9d. Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 194-218. In TDT, the goal is to analyze news broadcasts (text articles or text transcripts generated automatically from audio and video) and to identify previously unseen news events, or topics. Topics are then tracked by identifying subsequent news stories covering the same event. This is accomplished using a variety of off-line text processing, language modeling, and machine learning algorithms. However, TDT is not a real-time system, so it cannot annotate a live broadcast with collateral information, and furthermore is basically limited to topic detection.
As was stated above, one information retrieval and text analysis technique includes the IBM Intelligent Miner for Text, xe2x80x9cwww-4.ibm.com/software/data/iminer/fortext/xe2x80x9d. Reference may also be had to C. D. Manning and H. Schutze, xe2x80x9cFoundations of Statistical Natural Language Processingxe2x80x9d, MIT Press, 1999. However, neither of these approaches is specifically adapted to support on-line processing of streaming text data.
A number of commercial systems exist that support the manual addition of data to a broadcast signal (see, for example, Wave Systems Corporation and SkyStream Networks). These systems allow program producers to select, format, and schedule the delivery of data with the broadcast. However, these systems require the manual identification of collateral data.
An important problem that has not heretofore been adequately addressed relates to the identification of collateral information in real time based on the words spoken during a broadcast (or any other spoken discourse).
There are several challenges in this area. Although voice recognition has improved tremendously over the last few years, it cannot be expected that a voice recognition system will deliver a perfect transcript. Transcript quality is by far the best when the voice recognition system is trained with the voice of the speaker and the recording is made in a quiet environment with appropriate microphones. Unfortunately, in a broadcast setting (and many other similar settings) such optimal circumstances are not available. Instead, there may be many speakers, with some recording from a studio and others from the field. Furthermore, background noise and sub-optimal microphones contribute to the deterioration of the transcript quality.
The quality of the transcript has tremendous implications on the methods that can be applied to analyze it. The effectiveness of traditional text analysis tools decreases as the quality of the transcript decreases. Some of the issues that arise include lack of punctuation, lack of grammatical structure, and mis-recognized words (e.g., wrong words added as well as correct words missing). Sentences are xe2x80x9cconstructedxe2x80x9d from the continuous stream of spoken words by setting a pause threshold between words. This and the erroneous recognition of words often leads to sentences that are grammatically incorrect. Hence, methods that rely on analyzing the structure of a sentence alone rarely provide satisfactory results. Erroneous word recognition has a detrimental effect on word statistics, such that relying on these statistics may lead to unintended or unexpected results. Adding to these difficulties is the need to process the text in real-time.
The foregoing and other problems are overcome by methods and apparatus in accordance with embodiments of this invention.
This invention grows at least partially out of a realization by the inventors that even when reading a poor transcript a person can usually describe the essence of the discourse. It is thus be desirable to provide an automatic system that is capable of capturing this xe2x80x9cgistxe2x80x9d of a transcript. Once captured, this xe2x80x9cgistxe2x80x9d constitutes metadata for the discourse, which can be stored with the discourse and can thus provide value in its own right. The processing of this metadata is thus an important aspect of this invention.
Using the automatically created metadata a method performs concept searches that produce the desired collateral information, and a novel ranking algorithm sorts the results of the concept searches. The ranking algorithm is not limited for use in only the disclosed applications and embodiments, and may also prove to be quite appropriate when performing traditional text searches.
Disclosed herein are methods and apparatus for locating, in real time or substantially real time, collateral information pertinent to a live television broadcast (or any other discourse or information stream that contains speech).
As employed herein a broadcast can be any signal that conveys information, such as a news broadcast or live or recorded coverage of a meeting or an assembly. The signal can be sent through any suitable medium, including the airwaves, through a coaxial cable and/or through an optical fiber. The signal can be sent as packets through a data a communications network, such as the Internet, or as a normal or a high definition television signal. In the presently preferred embodiment the signal includes an audio component, preferably conveying speech (e.g., a news broadcast). However, and as will be made apparent below, it is not required that there be an audio component, as a closed captioning signal can be used, as can text appearing as part of the video signal, as well as sub-titles appearing in a foreign language program. Certain features appearing in one or more video frames can also be used as recognizable entities, such as a number of human faces appearing in a video frame, and possible a recognition of the person whose face appears.
In the exemplary network broadcast embodiment the inventive technique begins with a text transcript of the broadcast generated by an automatic speech recognition system. Given the fact that speaker independent speech recognition technology, even if tailored for a specific broadcast scenario, generally produces transcripts with relatively low accuracy, algorithms are provided for determining the essence of the broadcast from the transcripts. Specifically, the inventive technique extracts named entities, topics, and sentence types from the transcript and uses the extracted information to automatically generate both structured and unstructured search queries. An aspect of these teachings is a distance-ranking algorithm that is used to select relevant information from the search results. The entire process may be performed on-line and in real time or substantially real time, and selected query results (i.e., the collateral information) can be added to, inserted within or otherwise included with (referred to herein generally as multiplexed with) the broadcast stream.
The teachings of this invention address the foregoing problems by providing a Watson Automatic Stream Analysis for Broadcast Information system (or WASABI), which takes speech audio as input, converts the audio stream into text using a speech recognition system, applies a variety of analyzers to the text stream to identify information elements, automatically generates queries from these information elements, and extracts data from the search results that is relevant to a current program. The resultant data may be inserted or multiplexed into a broadcast signal and transmitted along with the original audio/video program. The system is fully automatic and operates on-line, allowing broadcasters to add relevant collateral information to live programming in real time.
Given the goal of finding collateral information for a live broadcast in real time, the various component parts of the most preferred embodiment of the system of this invention operate in real time or substantially real time.
The teachings of this invention provide a method, a system and a computer executable program stored on a computer-readable media for providing collateral information for inclusion with an information stream. The method includes steps of (a) examining the information stream to recognize a presence of events that occur in the information stream; (b) automatically generating database queries from recognized events; and (c) analyzing database query results so as to rank and select database query results to be inserted into the information stream as collateral information.