This invention relates to devices such as document scanners, digital cameras, personal digital assistants, laptop computers, and any other device that stores data and uploads or copies the data to a host computer. Even more particularly, the invention relates to using voice and speech recognition for performing commands on the data in a multiprocessing environment.
Many devices, such as digital cameras, personal digital assistants, laptop computers, and hand held document scanners, have the ability to collect many different images or documents from a variety of sources. In many cases the user may want to accomplish different tasks with each image or document captured. Some may be faxed or e-mailed to business associates or friends. Others may become part of a word processing document. Still others may need to be stored in a specific location or immediately printed. Normally, such devices are periodically connected to a host computer, and the collected image data files or document data files are copied to the host computer from the device as a group, or copied individually to the host computer. In either case, the user must look at each specific image data file or document data file after copying and take action to have that image or document processed or sent to the right place: save image A in this format here; save document B in that format there; attach image C to an e-mail message; fax document D to a business associate, etc. This can be a very time consuming process, especially if a large number of image data files and document data files have been captured. Also, it can become time consuming if the device has to be watched and continuously monitored. The time problem is compounded if commands must be repeated because the failure or problem is undiscovered until after the operation attempts to execute and it cannot execute for numerous reasons. In addition, if the images and documents are collected over an extended period of time, such as on a business trip, by the time the individual copies them to the host computer for processing and routing, the individual may have difficulty remembering exactly what action was intended for each one. The best time, in most cases, for an individual to determine the disposition of a captured image or document is at the time of capture.
It is thus apparent that there is a need in the art for an improved method or apparatus which will operate as a background process in a multitasking fashion that allows teaching and executing of commands that enable a user to annotate a captured image or document, at the time of capture, with speech disposition commands for processing and disposing of the image or document, so that the image or document will automatically be processed and disposed of according to the speech disposition commands stored in a voice annotation file or a speech disposition command file. These commands are executed by the device or the host computer upon copying, or uploading, the image data file or document data file and voice annotation file or a speech disposition command file to a host computer. The present invention meets these and other needs in the art.
It is an aspect of the present invention to use a voice pickup component integrated into a device to enable disposition commands to be made by voice and stored in a voice annotation file or a speech disposition command file for each image or document captured.
Another aspect of the invention is to operate in a multitasking mode that simultaneously executes and learns commands associated with the speech disposition commands in the speech disposition command file allowing for hands free operation.
A further aspect of the invention is to store all the commands issued by a user that will enable the device to make suggestions to the user based on the user""s past profile.
The above and other aspects of the invention are accomplished in devices that capture images or documents and store them as image data files or document data files in an internal memory. Prior to capturing the image or document, the device can execute speech control commands and speech disposition commands, such as a xe2x80x9cmemorizexe2x80x9d speech disposition command or a simple xe2x80x9cemailxe2x80x9d or xe2x80x9cfaxxe2x80x9d message. These commands are used to create new commands or subcommands. Also, at the time the image or document is captured, the devices can receive speech disposition commands from the user that will govern the processing and disposition of the image data files or document data files after copying or uploading them to a host computer. Voice input is ideal for small devices which may not have enough space to provide any other type of user interface. Also, voice input is ideal for devices where the user does not want to use buttons, a mouse, or deal with user interfaces, but rather work in a hands free environment or where the device supports multitasking which means tasks are executed in parallel and in the background.
For example, after scanning a document with a portable hand held document scanner, the user may make a first speech disposition command, such as xe2x80x9cfaxxe2x80x9d or xe2x80x9ce-mailxe2x80x9d or xe2x80x9cprint and savexe2x80x9d, and then make a second speech disposition command, such as xe2x80x9cmemorize Fran Bisco""s fax 777-444-4444xe2x80x9d by speaking into a voice pickup component, typically a microphone, in the portable scanner. The voice is converted into a recognition pattern, which is then compared to a predetermined set of recognition patterns stored in internal memory. If there is no match, then the device outputs a message to the user that the speech disposition command is not valid.
If there is a partial match, then the device outputs a different message to the user indicating that the speech disposition command needs to be modified, such as the command is missing a parameter or that the parameter does not make sense for this specific command. The device may offer some suggestions based on past commands executed.
There are various levels of sophistication inherent in different embodiments of the invention. In one embodiment, when the file transfer software or the device processes a speech disposition command such as xe2x80x9ce-mailxe2x80x9d, the user may designate the email address based on an earlier speech disposition command or if omitted the user may be prompted to provide the e-mail address the user wants the image data file or document data file sent to. When the e-mail command is complete, the file transfer software then accesses the e-mail utility in the host computer or the device accesses its e-mail utility, and the document data file associated with the speech disposition command is e-mailed. Once all the commands in the voice annotation file or speech disposition command file are executed, the file is normally deleted.
In another embodiment of the invention a device is trained to recognize the user""s spoken commands through speech and voice analysis software. In training mode, the voice analysis component of the software is accessed. The speech and voice analysis software may be located within the device, or located on a host computer system and accessed by the device while tethered to the host computer system.
For example, if using the speech and voice analysis software in the training mode, the user would access a predetermined list of the functions that can be executed by the file transfer software or the device with a speech disposition command. Command one, for example, may represent a set of instructions for performing a print function of an image data file or document data file. The syntax could be xe2x80x9cprint x copies on printernamexe2x80x9d. In selecting command one for training and analysis, the user would be prompted by the speech and voice analysis software to choose a word that the user wants to use to invoke the set of instructions for the print function of command one. The user may be prompted to make printername its default printer. The user may also be prompted to repeat the chosen words a number of times. A logical choice would be to choose the word xe2x80x9cprintxe2x80x9d, but any word chosen by the user not already being used for a function could be employed. Each repetition of the word xe2x80x9cprintxe2x80x9d is picked up by the device and analyzed by the speech and voice analysis software to develop a recognition pattern to encompass the variations and inflections in the user""s voice in speaking the word xe2x80x9cprintxe2x80x9d for the print command. The recognition patterns in the function recognition table have command numbers or command text that are linked to the predetermined sets of instructions for the various functions, which are also stored in memory in the host computer or the device. This embodiment would enable foreign languages to be utilized for the speech disposition command words, since the set of instructions for a function are tied to the command number or command text, and the user""s word choice, and subsequent training and voice analysis of that word choice.
In still another embodiment of the invention the recognition patterns for all the commands issued are stored in a memory database that is accessed when a recognition pattern associated with a speech disposition command, voice control command or voice annotation command does not match a recognition pattern in the function recognition table.