The present invention relates to systems and methods for multimedia processing. For example, the present invention provides systems and methods for receiving spoken audio, converting the spoken audio to text, and transferring the text to a user. As desired, the speech or text can be translated into one or more different languages. Systems and methods for real-time conversion and transmission of speech and text are provided.
The Internet has revolutionized the way that information is delivered and business is done. In June of 1999, Nielsen/NetRatings reported that there were a total of 63.4 million active Internet users in the United States, and 105.4 million total Internet users with Internet access. The average user spent 7 hours, 38 minutes on-line that month. Furthermore, user year-to-year growth rate is expected be in the range of 15% to 25% percent. Worldwide, it expected that there be greater than 250 million residential users, and greater than 200 million corporate users by the year 2005.
In the last few years, improvements in software and hardware have allowed the Internet to be used on a large scale for the transmission of audio and video. Such improvements include the availability of real-time streaming audio and video. Numerous media events are now xe2x80x9cbroadcastxe2x80x9d live over the Internet, allowing users to see and hear speeches, music events, and other artistic performances. With further increases in speed, the Internet promises to be the primary method for transmitting and receiving multimedia information. Present real-time applications, however, are limited in their flexibility and usefulness. For example, many real-time audio and video application do not permit users to edit or otherwise manipulate the content. The art is in need of new systems and methods for expanding the usefulness and flexibility of multimedia information flow over electronic communication systems.
The present invention relates to systems and methods for multimedia processing. For example, the present invention provides systems and methods for receiving spoken audio, converting the spoken audio to text, and transferring the text to a user. As desired, the speech or text can be translated into one or more different languages. Systems and methods for real-time conversion and transmission of speech and text are provided.
For example, the present invention provides Web-enabled systems comprising audio-to-text captioning capabilities, audio conference bridging, text-to-speech conversion, foreign language translation, web media streaming, and voice-over-IP integrated with processing and software capabilities that provide streaming text and multimedia information to viewers in a number of formats including interactive formats.
The present invention also provides foreign translation systems and methods that provide end-to-end audio transcription and language translation of live events (i.e., from audio source to intended viewer), streamed over an electronic communication network. Such systems and methods include streaming text of the spoken word, complete accumulative transcript, the ability to convert text back into audio in any desired language, and comments/questions handling submitted by viewers of the multimedia information (e.g., returned to each viewer in their selected language). In some embodiments, text streaming occurs through independent encoded media streaming (e.g., separate IP ports). The information is provided in any desired format (e.g., MICROSOFT, REAL, QUICKTIME, etc.). In some embodiments, real-time translations are provided in multiple languages simultaneously or concurrently (e.g., each viewer selects/or changes their preferred language during the event).
The present invention also provides audio to text conversion with high accuracy in short periods of time. For example, the present invention provides systems and methods for accurate transcription of live events to 95-98%, and accurate transcription of any event to 100% within a few hours of event completion.
The systems and methods of the present invention may be applied to interactive formats including talk-show formats. For example, as described in more detail below, in some embodiments, the systems and methods of the present invention provide an electronic re-creation of the television talk-show model over the web without requiring the participants to use or own any technology beyond a telephone and a web connected device (PC). Talk-show participation by invited guests or debatees may be conducted through the web. In some embodiments, the system and methods employ web-based, moderator and participant controls and/or web-based call-in xe2x80x9cscreenerxe2x80x9d controls. In some embodiments, viewer interaction is handled via email, comment/question queue maintained by a database, and/or phone call-ins. In some preferred embodiments of the present invention, real-time language translation in multiple languages is applied to allow participation of individuals, independent of their language usage. Streaming multimedia information provided in the interactive format includes, as desired, graphical or video slides, images, and/or video.
The present invention further provides systems and methods for complete re-creation of the classroom teaching model, including live lectures (audio and video), presentation slides, slide notes, comments/questions (via email, chat, and/or live call-ins), streaming transcript/foreign translations, complete lecture transcript, streaming videos, and streaming PC screen capture demos with audio voice-over.
For use in such applications, the present invention provides a system comprising a processor, said processor configured to receive multimedia information and encode a plurality of information streams comprising a separately encoded first information stream and a separately encoded second information stream from the multimedia information, said first information stream comprising audio information and said second information stream comprising text information (e.g., text transcript information generated from the audio information). The present invention is not limited by the nature of the multimedia information. Multimedia information includes, but is not limited to, live event audio, televised audio, speech audio, and motion picture audio. In some embodiments, the multimedia information comprises information from a plurality of distinct locations (e.g., distinct geographic locations).
In some embodiments, the system further comprises a speech to text converter, wherein the speech to text converter is configured to produce text from the multimedia information and to provide the text to the processor. The present invention is not limited by the nature of the speech to text converter. In some embodiments, the speech to text converter comprises a stenograph (e.g., operated by a stenographer). In other embodiments, the speech to text converter comprises voice recognition software. In preferred embodiments, the speech to text converter comprises an error corrector configured to confirm text accuracy prior to providing the text to the processor.
In some embodiments, the processor further comprises a security protocol. In some preferred embodiments, the security protocol is configured to restrict participants and viewers from controlling the processor (e.g., a password protected processor). In other embodiments, the system further comprises a resource manager (e.g., configured to monitor and maintain efficiency of the system).
In some embodiments, the system further comprises a conference bridge configured to receive the multimedia information, wherein the conference bridge is configured to provide the multimedia information to the processor. In some embodiments, the conference bridge is configured to receive multimedia information from a plurality of sources (e.g., sources located in different geographical regions). In other embodiments, the conference bridge is further configured to allow the multimedia information to be viewed (e.g., is configured to allow one or more viewers to have access to the systems of the present invention).
In some embodiments, the system further comprises a delay component configured to receive the multimedia information, delay at least a portion of the multimedia information, and send the delayed portion of the multimedia information to the processor.
In some embodiments, the system further comprises a text to speech converter configured to convert at least a portion of the text information to audio.
In still other embodiments, the system further comprises a language translator configured to receive the text information and convert the text information from a first language into one or more other languages.
In some embodiments, the processor is further configured to transmit a viewer output signal comprising the second information stream (e.g., transmit information to one or more viewers). In some embodiments, the viewer output signal further comprises the first information stream. In preferred embodiments, the viewer output signal is compatible with a multimedia software application (e.g., a multimedia software application on a computer of a viewer).
In some embodiments, the system further comprises a software application configured to display the first and/or the second information streams (e.g., allowing a viewer to listen to audio, view video, and view text). In some preferred embodiments, the software application is configured to display the text information in a distinct viewing field. In some embodiments, the software application comprises a text viewer. In other embodiments, the software application comprises a multimedia player embedded into a text viewer. In some preferred embodiments, the software application is configured to allow the text information to be printed.
The present invention further provides a system for interactive electronic communications comprising a processor, wherein the processor is configured to receive multimedia information, encode an information stream comprising text information, send the information stream to a viewer, wherein the text information is synchronized with an audio or video file, and receive feedback information from the viewer.
The present invention also provides methods of using any of the systems disclosed herein. For example, the present invention provides a method for providing streaming text information, the method comprising providing a processor and multimedia information comprising audio information; and processing the multimedia information with the processor to generate a first information stream and a second information stream, said first information stream comprising the audio information and said second information stream comprising text information, said text information corresponding to the audio information.
In some embodiments, the method further comprises the step of converting the text information into audio. In other embodiments, the method further comprises the step of translating the text information into one or more different languages. In still other embodiments, the method further comprises the step of transmitting the second information stream to a computer of a viewer. In other embodiments, the method further comprises the step of receiving feedback information (e.g., questions or comments) from a viewer.
The present invention further provides systems and methods for providing translations for motion pictures, television shows, or any other serially encoded medium. For example, the present invention provides methods for the translation of audio dialogue into another language that will be represented in a form similar to subtitles. The method allows synchronization of the subtitles with the original audio. The method also provides a hardcopy or electronic translation of the dialogue in a scripted form. The systems and methods of the present invention may be used to transmit and receive synchronized audio, video, timecode, and text over a communication network. In some embodiments, the information is encrypted and decrypted to provide anti-piracy or theft of the material. Using the methods of the present invention, a dramatic reduction (e.g., 50% or more) in the time between a domestic motion picture release and foreign releases is achieved.
In some such embodiments, the present invention provides methods for providing a motion picture translation comprising, providing: motion picture audio information, a translation system that generates a text translation of the audio; and a processor that encodes text and audio information; processing the motion picture audio information with the translation system to generate a text translation of the audio; processing the text translation with the processor to generate encoded text information; processing the motion picture audio information with the processor to generate encoded audio information; and synchronizing the encoded text information and the encoded audio information. Such methods find use, for example, in reducing the cost and process delay of motion picture translations by more than 50% (e.g., 50%, 51%, . . . , 90%, . . . ).
The present invention also provides a system comprising a processor configured to receive text information from a speech-to-text converter, receive multimedia information from a conference bridge, encode text information into an information stream, encode multimedia information into an information stream, and send and receive information from a language translator. In some embodiments, the processor further comprises a resource manager configured to allow said processor to continuously process 10 or more (e.g., 11, 12, . . . , 100, . . . , 1000, . . . ) information streams simultaneously.
The present invention further provides systems and methods for two-way real time conversational language translation. For example, the present invention provides methods comprising, providing: a conference bridge configured to receive a plurality of audio information inputs, a speech-to-text converter, a text-to-speech converter, and a language translator; inputting audio from a first user to said conference bridge to provide first audio information; converting the first audio information into text information using the speech-to-text converter; translating the text information into a different language using the language translator to generate translated text information; converting the translated text information into translated audio using the text-to-speech converter; and providing the translated audio to a second (or other) user(s).