Recent advances in pattern classification have enabled the development of sophisticated software systems that can recognize natural language data (i.e. natural language user input) such as speech (see for example L. Rabiner and B. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, Englewood Cliffs, N.J., 1993) or handwriting (see for example G. Lorette, “Handwriting Recognition or Reading? Situation At The Dawn of the 3rd Millennium”, Advances In Handwriting Recognition, Series in Machine Perception and Artificial Intelligence, Vol. 34, pp. 3-15, World Scientific Publishing Co. 1999).
These applications allow users to communicate with a computerised system in a natural and convenient way, and permit the automation of tasks that previously required human input. Some examples of such applications include interactive voice response (IVR) systems, automated cheque-processing systems and automated form data-entry systems.
In addition, the growth of networked computing and the Internet has enabled the development of complex distributed systems, and the existence of open, standardized protocols has allowed the integration of end-user devices, centralized servers, and applications. An example of a three-tiered distributed system architecture is depicted in FIG. 1 (prior art), illustrating a system 100 which includes a client layer 110, network layer 120 and application layer 130. Client device 140 communicates with one or more servers 150 which in turn communicate with one or more applications 160. The combination of distributed computing and pattern recognition techniques has made possible the development of systems such as Netpage™ by Silverbrook Research Pty Ltd, an interactive paper-based interface to online information. Systems such as this give users the ability to interact with information from any location that provides network connectivity (including wireless network access) using familiar human-communication techniques such as handwriting or speech.
The basic processing steps of presently known pattern recognition systems are depicted in FIG. 2 (prior art). Processing begins when an input device 210 generates a signal 220 that is to be recognized by the system 100 (that is, to be classified as belonging to a specific class or sequence of class elements). Usually, one or more pre-processing procedures 230 are applied to remove noise and produce a normalized signal 240, which is then segmented 250 to produce a stream of primitive elements 260 required for a classification procedure 270. Note that often this segmentation 250 is “soft”, meaning that a number of potential segmentation points are located, and the final segmentation points are resolved during classification 270 or context processing 290.
The segmented signal 260 is then passed to a classifier 270 where a representative set of features is extracted from the signal and used in combination with a pre-defined model 275 of the input signal to produce a set of symbol hypotheses 280. These hypotheses 280 give an indication of the probability that a sequence of segments within the signal represent a basic symbolic element (e.g. letter, word, phoneme, etc.). After classification 270, the context-processing module 290 uses the symbol hypotheses 280 generated by the classifier 270 to decode the signal according to a specified context model 295 (such as a dictionary or character grammar). The result 297 produced by the context processing 290 is passed to the application 299 for interpretation and further processing.
Natural language input is inconsistent, noisy, and ambiguous, leading to potential recognition and decoding errors. However, high recognition accuracy is required for pattern recognition applications to operate successfully, since mistakes can be expensive and frustrating to users. As a result, recognition systems should make use of as much contextual information as possible to increase the possibility of correctly recognizing the natural language input. For example, when recognizing a signal that must represent a country name, the recognition system can use a pre-defined list of valid country names to guide the recognition procedure. Similarly, when recognizing a phone number, a limited symbol set (i.e. digits) can be used to constrain the recognition results. The problem domain for many pattern recognition systems is inherently ambiguous (i.e. many of the input patterns encountered during processing cannot be accurately classified without further information from a different source).
The following discussion refers to handwriting by way of background information, however, the present invention should not be considered to be limited to application to only handwriting as the form of natural language data input.
Digital ink is a digital representation of the information generated by a pen-based input device. Generally, digital ink is structured as a sequence of strokes that begin when the pen device makes contact with a drawing surface and ends when the pen-based input device is lifted. Each stroke comprises a set of sampled coordinates that define the movement of the pen-based input device whilst the pen-based input device is in contact with the drawing surface.
As an example, one of the major issues faced in the development of highly accurate handwriting recognition systems is the inherent ambiguity of handwriting (e.g. the letters ‘u’ and ‘v’, ‘t’ and ‘f’, and ‘g’ and ‘y’ are often written with a very similar appearance and are thus easily confused). Human readers rely on contextual knowledge to-correctly decode handwritten text, and as a result a large amount of research has been directed at applying syntactic and linguistic constraints to handwritten text recognition (see for example: H. Beigi and T. Fujisaki, “A Character Level Predictive Language Model and Its Application to Handwriting Recognition”, Proceedings of the Canadian Conference on Electrical and Computer Engineering, Toronto, Canada, Sep. 13-16, 1992; U. Marti and H. Bunke, “Handwritten Sentence Recognition”, Proceedings of the 15th International Conference on Pattern Recognition, Barcelona, Spain, Volume 3, pp. 467-470, 2000; D. Bouchaffra, V. Govindaraju, and S. Srihari, “Postprocessing of Recognized Strings Using Nonstationary Markovian Models”, IEEE Transactions Pattern Analysis and Machine Intelligence, 21(10), pp. 990-999, October 1999; J. Pitrelli and E. Ratzlaff, “Quantifying the Contribution of Language Modeling to Writer-Independent On-line Handwriting Recognition”, Proceedings of the Seventh International Workshop on Frontiers in Handwriting Recognition, Amsterdam, Sep. 11-13, 2000; R. Srihari, “Use of Lexical and Syntactic Techniques in Recognizing Handwritten Text”, ARPA Workshop on Human Language Technology, Princeton, N.J., March 1994; and L. Yaeger, B. Webb, and R. Lyon, “Combining Neural Networks and Context-Driven Search for On-Line, Printed Handwriting Recognition in the Newton”, AI Magazine, Volume 19, No. 1, pp. 73-89, AAAI 1998).
The increasing use of pen-based computing and the emergence of paper-based interfaces to networked computing resources (see for example: Anoto, “Anoto, Ericsson, and Time Manager Take Pen and Paper into the Digital Age with the Anoto Technology”, Press Release, 6th Apr., 2000; and Y. Chans, Z. Lei, D. Lopresti, and S. Kung, “A Feature Based Approach For Image Retrieval by Sketch”, Proceedings of SPIE Volume 3229: Multimedia Storage and Archiving Systems II, 1997) has highlighted the need for techniques to interpret digital ink. Pen-based computing allows users to interact with applications.
As a result of the progress in pen-based interface research, handwritten digital ink documents, represented by time-ordered sequences of sampled pen strokes, are becoming increasingly popular (J. Subrahmonia and T. Zimmerman: Pen Computing: Challenges and Applications. Proceedings of the ICPR, 2000, pp. 2060-2066). Handwriting typically involves writing in a mixture of writing styles (e.g. cursive, discrete, run-on etc.), a variety of fonts and scripts and different layouts (e.g. mixing drawings with text, various text line orientations etc.).
Presently, handwriting recognition accuracy remains relatively low, and the number of errors introduced by recognition (both for the database entries and for the handwritten query) means that present techniques do not work well. The process of converting handwriting into text results in the loss of a significant amount of information regarding the general shape and dynamic properties of the ink. In many handwriting styles (particularly cursive writing), the identification of individual characters is highly ambiguous.
Similar work has been performed in the field of speech recognition, natural language processing, and machine translation.
Some known natural language recognition systems currently exist. Paragraph, Inc. offers a network-based distributed handwriting recognition system called “NetCalif” (ParaGraph, Handwriting Recognition for Internet Connected Device, November 1999) that is based on their Calligraphy handwriting recognition software. The user's natural handwriting—cursive, print, or a combination of both—is captured by client software, then transmitted from an Internet-connected device to the NetCalif servers where it is converted and returned as typewritten text to the client device.
Philips has developed “SpeechMagic”, a client/server-based, professional speech recognition software package (Philips, SpeechMagic 4.0, 2000). This system supports specialized vocabularies (called ConTexts) and dictation, recognition, and correction can be done, independently of the location, across a LAN, WAN, or the Internet.
In a networked information or data communications system, a user has access to one or more terminals which are capable of requesting and/or receiving information or data from local or remote information sources. The information source, in the present context, may be a database associated with an application. In such a communications system, a terminal may be a type of processing system, computer or computerised device, personal computer (PC), mobile, cellular or satellite telephone, mobile data terminal, portable computer, Personal Digital Assistant (PDA), pager, thin client, or any other similar type of digital electronic device. The capability of such a terminal to request and/or receive information or data can be provided by software, hardware and/or firmware. A terminal may include or be associated with other devices, for example a pen-based input device for handwriting input or a microphone for speech input.
An information source can include a server, or any type of terminal, that may be associated with one or more storage devices that are able to store information or data, such as digital ink, for example in one or more databases residing on a storage device. The exchange of information (i.e., the request and/or receipt of information or data) between a terminal and an information source, or other terminal(s), is facilitated by a communication means. The communication means can be realised by physical cables, for example a metallic cable such as a telephone line, semi-conducting cables, electromagnetic signals, for example radio-frequency signals or infra-red signals, optical fibre cables, satellite links or any other such medium or combination thereof connected to a network infrastructure.
The reference to any prior art in this specification is not, and should not be taken as, an acknowledgment or any form of suggestion that such prior art forms part of the common general knowledge.