1. Field of the Invention
The present invention relates to the manipulation, navigation, and assemblage of multimedia using voice commands to allow for remote assembly and display of images during projected presentations, by combining pre-existing computer programs with original computer programs and electronic circuitry. In particular, a method and apparatus as further described allows voice commands to operate impromptu juxtapositioning and displaying of various forms of data onto a presentation screen. The display can either be immediately recorded for a subsequent presentation or incorporated into an active presentation of movies, still images, text, and sound. Submitted utterances go through a series of conversional and identification filters (based on pre-set user preferences), which automatically search, scrutinize, and capture from pre-selected databases (local or remote), on-line commercial media vendors, and/or the World Wide Web (WWW). The speaker then sees instantaneous results and can either submit those results to a large display, modify the search, or juxtapose the results to fit desired projected output.
2. Description of the Related Art
Known in the art is the use of voice commands for voice-to-text conversion of language utterances with preferred words stored in a modifiable word bank. A voice recognition module working in conjunction with a computer and utilizing a microphone as an input device allows for the display of translated language on the computer screen. See U.S. Pat. No. 4,984,177 (Rondel et al.).
More recently, the use of voice recognition has been implemented for the navigation of application programs being utilized by a single operating system. As seen, for example, in U.S. Pat. No. 5,890,122 to Van Kleeck et al., a method and system is described for an application program to receive voice commands as an input facility for the navigation and display of drop-down menus listing available commands for the application. The available commands can be modified as preferred by the user allowing the list to be made variable.
The navigation through applications utilizing xe2x80x9cwindowsxe2x80x9d, or graphical interfaces of a portion of a screen that can contain its own document, is also demonstrated by U.S. Pat. No. 5,974,384 to Yasuda. Again, these systems demonstrate the use of the voice recognition module accompanying a computer system employing a particular operating system. The software or hardware works in close relationship with the operating system allowing the voice recognition process in the system to provide a signal resulting from the executed voice input. What is desired, however, is not just a means of navigation through a single application being used on a computer using voice command utilities, but a unit that allows access to separate database to provide hands-free navigation through variable output facilities.
The navigation of displays outside the field of text and graphical user interfaces using voice technology is evident in its implementation for the World Wide Web (WWW). In U.S. Pat. No. 5,890,123 to Brown et al., a system and method is disclosed for a voice controlled video screen display wherein the data links of a web page are utilized by speech recognition, thereby allowing the navigation of the WWW by voice command. The software program in this application is a web browser. Though a web browser may be utilized by a variety of operating systems, the displays retrievable are made accessible only through entry into a global network, and the xe2x80x9chands-freexe2x80x9d navigation can only be accomplished by the displayed links particular for the web page. The present invention demonstrates the assembly, manipulation, and navigation of digital displays beyond those simply comprising displays produced by Hypertext Markup Language (HTML) on the World Wide Web. This system and method will also teach, not only the navigation of text and graphical interfaces, but also the manipulation and assembling of various types of on-screen-digital displays and multimedia, retrievable and searchable from variable databases.
Multimedia as it is used for presentation purposes covers a wide range of displays, both audio and visual. A cohesive organization of these displays is paramount when presenting the images on a screen. There are currently graphics and recording programs that can provide voice-command manipulation of images. However, there is no graphics program that allows easy and precise manipulation, of non-graphics experts and voice functioning systems, which benefit the product-as-a-whole, without re-structuring it to accommodate the varying degrees of inputs and outputs. The art of xe2x80x9chands-freexe2x80x9d manipulation of digital images, such as still pictures or movies, and sound objects is limited. U.S. Pat. No. 5,933,807 to Fukuzawa shows how a displayed picture can be manipulated by a generated sound signal. A major limitation exists in that the arrangement and display of the images occurs within a single screen in that prior art.
For instance, an example that exemplifies the need for the present system and method is one in which a doctor in an operating room conducting a complex procedure requires a recent X-ray, which can be called-up immediately. Or, for example, an auto mechanic, who is following a procedure from images in an on-line manual, requires an immediate visual comparison to an older part and needs to perform this action without taking his/her hands off of the tool being held in place. Lastly, there may be envisioned a speaker who, during a business presentation, impresses the clientele by visually addressing tough questions answered by a simple vocal query through a pre-constructed local database.
Thus, there is a need for a system that provides xe2x80x9chands-freexe2x80x9d navigation, manipulation, and assembly of a variety of multimedia, which is accomplished remotely for the purposes of presentation on various screens. The present invention can assemble searched text and images from variable databases, and allow a user to record, juxtapose, and manipulate image displays either impromptu or pre-planned.
Combined with external and internal computer components, and internal software programs, this system comprises a unit that enables any user to vocally assemble and display (individually or in a series) still images, movie-clips, feature-length movies, feature-length audio presentations, short audio-clips, text, or any combination of the aforementioned without any concern of media type. The system can also, through verbal command, xe2x80x9cfree-floatxe2x80x9d the placement of any visual image, including any non-rectangular forms, transparent images, or text, onto a background image or similar image that concurrently becomes the background image. Lengthy presentations (still-image, movie, or audio or a combination) can also be automatically re-configured to fit into pre-assigned time frames that previously, would have had to of been disrupted due to sporadic pauses caused by the typical external human intervention. The option to recordxe2x80x94with or without the original verbatim search queryxe2x80x94into video/audio playback and/or text-computer readable language for future reference also exists.
An inputted voice utterance is converted by voice recognition means into computer-readable text in the form of a command. The commands are categorized as xe2x80x9csearchxe2x80x9d, xe2x80x9cmanipulationxe2x80x9d, and xe2x80x9cnavigationxe2x80x9d, each of which comprise other commands that are triggered in succession by directionals, which are a means for triggering the commands separately or simultaneously.
The means necessary for performing and directing the commands include a series of EDSP filters, which are search and image capturing commands that contain therein a series of xe2x80x9cmedia readerxe2x80x9d directionals. The EDSP filters are used as a means for taking the converted search command, identifying relevant database(s), committing the search, and retrieving results by conducting multiple-page xe2x80x9cplanexe2x80x9d searches for compatible media. The filtering means then transfers matching data into a means for juxtapositioning or displaying search results, termed herein as the xe2x80x9cimage negotiatorxe2x80x9d.
The juxtapositioning means, or image negotiator, which includes an xe2x80x9cun-latcherxe2x80x9d, a xe2x80x9cbordershopxe2x80x9d, and a xe2x80x9ccornershopxe2x80x9d, prepares the image for presentation by way of the xe2x80x9cimage griddlexe2x80x9d. The image griddle is a platform wherein a means is provided for the desired media to be manipulated and organized for display. The media may be organized as a table within the image negotiator wherefrom a hanging command issued to the image griddle is provided as a means for allowing the desired images to be placed and grouped according to user preferences into the image griddle. A xe2x80x9cshrouderxe2x80x9d is used as a command means for overlaying images.
It is the objective of the present invention to allow voice commands to operate impromptu juxtapositioning and display of various forms of data onto presentation screens, thereby allowing for an active and remotely assembled concise visual and audio presentation.
It is further an objective of the present invention to allow for the on-screen manipulation of images using voice-command, whereby images, tables, and other multimedia clips are maneuvered and managed freely, within active borders and corners.
It is further an objective of the present invention to allow an image or text to free-float onto another background image, thereby motivating any change in graphic conversion and presentation.
It is further an objective of the present invention to provide the option to record video/audio playback and/or stored search queries.