The present invention relates generally to signal processing systems and methods, and more particularly to systems that process speech and handwritten or hand-drawn gesture input.
The Provisional United States Patent Application entitled xe2x80x9cAPPARATUS AND METHOD FOR PROCESSING HAND-MARKED INPUT AND SPEECH INPUTxe2x80x9d, Serial No. 60/086,346, filed May 20, 1998, is herein incorporated by reference in its entirety.
Since the early 1980s, personal computers have become increasingly powerful, able to store large amounts of information, create complex text, and multimedia documents including now color animation, 3D effects, and sound. In addition, these devices are able to communicate over telephone lines or local area networks with other computers directly or through the Internet. These computers are able to draw on large databases stored on large capacity hard drives in the personal computers. PCs can also tap into remote databases through their networking and communications capabilities.
Although the human interface to these computers has evolved to a certain extent from the early 1980s, in many ways the major element of this interface, the keyboard, is still very similar to that of a manual typewriter whose origins date to the late part of the 19th Century. For most computers, even in the mid-990s, the 100-key keyboard with alpha/numeric and function keys still forms the basic input means for accessing and creating information on personal and other computers. Ironically, the keyboard that is in common use has its basic layout designed to slow typists down. This design dates from the days of mechanical typewriters whose keys jammed when typists became too proficient. Although many people using computers have learned to type very rapidly, for many who do not learn to type well or who do not know how to type, the keyboard interface to the computer represents a barrier to its use. In addition, many people who do learn to type well can develop a repetitive stress disorder, an inflammation of the wrists which can result in the complete inability to type and therefore loss of productivity on the computer.
In the late 1980s a pointing device, called a mouse, was developed for computer input which allows the user to move a curser or indicator within the computer output display screen. By pointing and clicking a mouse, certain words or areas on the screen may be chosen by the user. In this way, navigation of the display screen and command of computer operations may be controlled by pointing to various items or words or icons on the screen. The pointing device may be a mouse, which indirectly points to items on the screen, or a pen-type device applied directly to the screen or even a finger with a special touch screen.
Other operations are possible using these devices such as highlighting a word in order to provide an additional command by means of other switches on the pointing device to delete the word or change its appearance. The development of the graphic user interfaces (GUI), have greatly enhanced the use of pointing devices for the human interface to the computer. Although these pointing devices may substitute for a series of keystrokes for moving a pointer around on the screen or carrying out various operations, mouse operations are basically complementary to those provided by the keyboard. However, it is also difficult to operate a mouse and keyboard at the same time. In addition, it is not practical to use mouse input to create text or to input arbitrary commands to the computer.
Since the early 1990s, the use of automatic speech recognition for voice input to the computer has become an increasing reality. Voice input devices in a computer require significant computing power for their operation. Early speech recognition devices could be trained by an individual to respond to a small number of command words effectively substituting for command keys on the keyboard or a limited number of mouse clicks in a Windows interface. As computers have become more powerful in their computing speed and memory capacity, automatic speech recognition systems for computer input have become more capable. It is possible on personal computers to use voice input commands to activate any Windows command that appears in the menu structure using discrete or continuous speech recognition without requiring navigation through several layers of menus. Speech recognition systems are an especially powerful substitute for the keyboard for the input of individual words of text to create documents or for discrete commands. Such systems, however, are not a good substitute for the ease and speed of display screen navigation or other drawing operations (for example circling a block of text and moving it by dragging it to a new place on the screen), which can easily be provided by a mouse or other pointing device. Moreover, such speech recognition systems have difficulty determining whether the received speech is a command or text.
Although the promise of automatic speech recognition systems for text creation using computers is great because they are rapid and easy to use, these systems suffer from some significant limitations which have impeded their general use in computers. The accuracy of speech recognition systems, even those well trained to the voice of a single user, are limited to approximately 95%, and may be significantly lower with respect to proper names and words outside of the vocabulary, which may occur quite often in many business and technical uses of computers. Speech recognition systems are also not very effective for various editing and formatting tasks, for example, the insertion of punctuation marks. In addition, voice input is not a good mechanism for navigating the display screen of a computer and carrying out the functions of a mouse or other pointing device which allow operations such as xe2x80x9cdragand-drop,xe2x80x9d highlighting words, moving blocks of text, manipulating and creating graphics, or indicating a text insertion point.
The physical size of computers has limited their utility in certain applications. Like many electronic devices, computers have grown dramatically smaller as they have evolved. In recent years, laptop and even palmtop computers the size of small books have become popular. A computer the size of a book, which may be carried anywhere or a small pocket-sized device, has no room for a keyboard large enough to accommodate hands of most adults. In addition, if a computer is to be used in the field as a palmtop device or even in an airplane seat, the use of a mouse-type pointing device that requires an external pad is impractical. A pointing device such as a pen for use on even a small computer display surface is extremely useful.
A number of devices without keyboards have been proposed that use pens and have handwriting recognition as input and/or receive mouse-type input. Those introduced have had limited ability to recognize even fairly clear handwriting. Although handwriting recognition by pen input devices has significantly improved in the last few years, like speech recognition, it still remains a challenging technical problem. For example, pen input in currently available systems is tiring and impractical when entering large amounts of text. Developing even smaller personal computing devices with the complete text input and computing capability of larger sized personal computers remains a major goal and interest of the computing public and the computing industry.
There is, therefore, a need for a computer system that departs from conventional methods and achieves increased performance by integrating speech recognition and handwritten and hand-drawn (e.g., pen or gesture input) recognition to overcome the disadvantages of either mode of recognition used alone or in an unintegrated combination.
Methods and apparatus consistent with this invention process handwritten or hand-drawn input and speech input. Method steps include recognizing received handwritten or hand-drawn input, recognizing received speech input, and creating or modifying an electronic document according to the speech or handwritten or hand-drawn input.
An apparatus includes structure for recognizing handwritten or hand-drawn input, structure for recognizing speech input, and structure for activating modes for processing the handwritten or hand-drawn input and the speech input responsive to handwritten or hand-drawn input or the speech input.
Both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.