The present invention pertains to a method and an apparatus for separating processing for language-understanding from an application and its functionality, the application containing functionality within a provided domain.
The present invention pertains to a method and a system for separating processing for language-understanding from an application and its functionality, said application containing functionality within a provided domain.
Conventional speech recognition application programming interfaces (API:s), such as Microsoft Speech API(trademark) and Java Speech API(trademark), take input on the form of a grammar and a lexicon, with little other information on the context or application domain in which the language interface is to operate. The output of such API:s is typically a stream of words, and an application designer must build a substantial amount of custom code to interpret the words and make appropriate application calls.
As illustrated in FIG. 1 of the attached drawings, the conventional speech recognizer with its API is so to speak glued with custom code to the application itself The custom code provides the xe2x80x9cintelligencexe2x80x9d in translating a stream of words received from the speech recognizer to appropriate application calls. Any translation to actual application objects, methods, etc. has to be done on a per-case basis in the custom code.
Other speech API:s aim at reducing the amount of custom code, by allowing the use of modal dialogs. For example, the Philips SpeechMania(copyright) 99 product has been demonstrated with a pizza ordering application, where a user goes through dialog modes involving for instance selecting pizza toppings. A disadvantage of this type of technology is that the system will only understand the utterances expected in the given mode. If the user changes his drink order while the user is expected to select pizza toppings, the system may fail to understand this. The degree to which the system xe2x80x98understandsxe2x80x99 the utterances in this kind of interaction is limited; each mode and the utterances valid therein must be anticipated by the developers, and directly related to the action the system takes as a response to the user input. This also means it requires a substantial amount of interface design work, with extensive studies (such as xe2x80x9cwizard of ozxe2x80x9d-type of settings) to determine every possible phrase a user might come up with in a given situation.
A widely distributed application of speech recognition and language-understanding today is different forms of telephony services. These systems are typically built with a central server, which accepts incoming voice calls over standard telephone lines. The users are presented with an interactive voice-based interface, and can make choices, navigate through menus, etc by uttering voice commands. The complete set of software, ranging from the speech recognition, through language-understanding, to application calls, database searches, and audio feedback, resides on the central server. This put high demands on the central server hardware and software, which also must support a large number of simultaneous interactive voice sessions. Typical applications for this type of system is ticket booking, general information services, banking systems, etc. An example of such a system is the xe2x80x9cSJ Passenger traffic timetable information systemxe2x80x9d, in use by the Swedish Railway.
Many speech- and language-enabled applications do not use speech recognizer API:s (see description above with respect to the discussion of xe2x80x9cconventional speech recognition API:sxe2x80x9d). Instead, they implement the whole range of technologies required, from speech recognition through syntactic and semantic (linguistic) processing to the actual application calls and effects. Such designs are called, monolithic, since they do not make use of specified API:s to distinguish between different interchangeable modules of the language interaction system, but rather put all components in xe2x80x9cone designxe2x80x9d. An example of such a design is disclosed by, Bertenstam J. et al, xe2x80x9cThe Waxholm Application Data-Basexe2x80x9d, Proc. of Eurospeech ""95, Vol. 1, pp. 833-836, Madrid, 1995. The xe2x80x9cWaxholm systemxe2x80x9d is a speech-controlled system for search and retrieval of information on boat timetables and services in the Stockholm archipelago. The system implements all relevant linguistic components, such as speech recognition, lexicon, grammar, semantics and application functionality internally.
The field of distributed systems in general deals with the distribution of databases, object repositories, etc over computer networks, The general intent is to provide unified high-level platforms to be used by computer applications that require runtime data to be presented and distributed over a network. One effort to provide a standardized framework for the design of distributed systems is the Common Object Request Broker Architecture (CORBA), proposed by the Object Management Group (OMG), The CORBA architecture is centered around the Object Request Broker (ORB), which handles application (client) calls to a distributed object by providing object stubs (or proxies) on the client-side, on which remote procedure calls are made and transferred to the actual object implementation (server) over the network.
The present invention addresses some fundamental problems that currently arise when language-based interaction is to be performed with multiple application entities present. These can be summarized in three main issues:
1) The lack of a consistent natural language interaction model for different application entities. This means that a multitude of different applications exist with different and mutually inconsistent linguistic interfaces. The interpretation of the recognized strings of words received from the speech recognizers is done by custom code (see description above with respect to the discussion of xe2x80x9cconventional speech recognition API:sxe2x80x9d), or even with the complete speech recognition and linguistic processing as an integral part of the application (see description above with respect to the discussion of xe2x80x9cmonolithic applications with language-based interactionxe2x80x9d), and thus with application-specific solutions. This means that the ways users speak to machines varies and is inconsistent.
2) The lack of transparent interaction using natural language with multiple application entities. Given multiple natural language-enabled applications, there is a lack of unifying methods to bring the language interfaces together so as to make them accessible at once by the user. Application-specific solutions to distinguish between different sub-functionalities of a system exist (such as prefixing an utterance by xe2x80x9ctelephone, . . . xe2x80x9d or xe2x80x9ccalendar, . . . xe2x80x9d to indicate the context of a command), but this is still limited to customized solutions of particular application designs, and the parsing and linguistic processing is still left to each particular application once the destination of an utterance is determined. Thus, there exists a lack of xe2x80x9cunification of linguistic processing and execution xe2x80x9d, given different accessible applications. As an example of where this type of interaction is problematic, consider a situation when a user wants to control different electronic systems integrated in a car, a stereo and a climate control system. Rather than prefixing each utterance with a destination (by saying things such as xe2x80x9cradio, louderxe2x80x9d, or xe2x80x9cclimate, coolerxe2x80x9d), the system should be able to resolve sentences in the context of both applications simultaneously and understand that the verb xe2x80x9clouderxe2x80x9d is addressed to the radio, and xe2x80x9ccoolerxe2x80x9d is addressed to the climate control system, something that currently can only be achieved by building the two applications as one single application unit.
3) The requirement to build natural language processing into all entities. Since there are no methods of unifying the linguistic processing of disparate applications in one design (see the two previous points), the full linguistic processing must with conventional techniques be built into each application. This is generally a problem when it comes to efficient resource usage (with respect to memory and processing power, as well as to the manpower required to develop a working system). Whereas less problematic in centralized design (such as exemplified in the description above with respect to the discussion of xe2x80x9cconventional telephony systemsxe2x80x9d), this problem becomes severe in the case of built-in systems, portable designs, etc, since such implementations are extremely sensitive to the amount of processing hardware required for 4 particular application.
The present invention relates to a method and an apparatus for separating processing for language-understanding from an application and its functionality, the application containing functionality within a provided domain. It intends to solve problems relating to prior systems and specifically to provide a general means for controlling application means, such as a radio, air condition system, etc. and other electrically controlled appliances, and software applications on a computer.
In order to achieve the aims of the present invention it sets forth a method of organizing linguistic data describing linguistic interaction with an application specific linguistic logic and a general linguistic understanding logic. The method includes the steps of separating the application logic from the general logic, the application logic containing functionality with a predetermined application domain, wherein the functionality being provided is through a data model, reflecting the functionality to the general logic through use in linguistic interaction by providing that the application exports information about words and senses to the general logic and provides a distributed consistent linguistic interaction model for different applications using the same general logic to interpret applications with different functionality.
In another embodiment information about words comprises objects, attributes, and classes from the object oriented model.
A further embodiment finds that the objects are nouns, the attributes are adjectives and the classes are verbs.
A still further embodiment sets forth that grammars are provided by the application for specific unusual expressions.
Another embodiment of the present invention provides that the general linguistic understanding logic belongs to speech-recognition.
Yet another embodiment provides that the general linguistic understanding logic belongs to text.
In yet another embodiment standard grammar for utterances and phrases in various languages, which are independent Of the domain, are built into the general language-understanding linguistic logic.
A further embodiment encompasses that closed word classes and some very common words in each known language are built into the general language-understanding linguistic logic.
Further, one embodiment provides that a transfer of words is considered as a two-step process including an establishment of an on-demand connection or presence to determine the need of transfer of the application structure to the general linguistic-understanding logic and the provision of application-specific linguistic data from the application to the general linguistic-understanding logic.
Another embodiment of the invention comprises that the second step is accomplished by direct transfer, or by providing access through a distributed object system.
A still further embodiment provides that a wireless network is used as an interface between the general logic and the application specific logic. In one embodiment the wireless network is operating in accordance with the Bluetooth standard.
The present invention also sets forth a system of organizing linguistic data describing linguistic interaction with an application means for specific linguistic logic and a general linguistic understanding logic engine means containing an application independent grammar description including means for separating the means for specific logic from the engine means, the specific logic means containing functionality within a predetermine application domain, the functionality being provided through a data model, and means for reflecting the functionality to the logic engine for use in linguistic interaction by providing that the specific logic means exports information about words and senses to the engine means and means for providing a distributed consistent linguistic interaction for different application using the same general logic engine means to interpret applications with different functionality.
The system according to the present invention is also able to set forth the above method embodiments as disclosed in the attached dependent system claims.