A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
1. Field of the Invention
The present invention relates to user interfaces for computer systems and more particularly to voice activated, agent-based user interface systems.
2. Description of the Related Art
Computer systems are routinely used to interface with the content of various computer-realized documents, i.e., documents stored in or displayed on those systems. Such systems are often difficult to use, requiring extensive training and user experience before one can enjoy anything approximating ease of use. Historically, interfaces have required user input in the form of typing at a keyboard or pointing and clicking using a mouse or similar pointing device. Recently, software agents, i.e., software programs that present a more accessible, easy to operate interface to the user, have become available. Software agents are executable programs that interface with other programs and with the user. Agents are typically used to provide enhanced user-computer interaction and a more xe2x80x9cuser friendlyxe2x80x9d computing experience. Agents generally respond to user commands to invoke other programs or program functions. Agents also translate user input into the form required by other programs. Smart agents, in particular, are agents that are capable of making decisions based on a pattern of user inputs, rather than basing their decisions on explicit user commands.
Agents are more fully described in Fah-chun Cheong, Internet Agents (1996) (hereinafter Cheong), incorporated herein in its entirety by reference. Additional information may be found in xe2x80x9cActiveX(trademark) Technology for Interactive Software Agentsxe2x80x9d furnished as Appendix B to this disclosure. Additional documentation on the Microsoft(copyright) ActiveX Technology may be found on the Internet at:
www.Microsoft.com/workshop/imedia/agent/documentation.asp
Such agents are typically implemented in software to run on user command. User input comes from a keyboard or a pointing device, such as a mouse or trackball well known in the art. Some computer systems can also accept limited voice input, which is then passed as software messages to the agent program. Such systems are referred to as xe2x80x9cvoice-enabledxe2x80x9d.
The advantage of agent technology, in theory, is that it enables easy user interface without extensive user skill. It has to date been limited in its application because the agent software must be written with a specific set of potential action choices or commands in mind.
Programming an agent-based application typically requires careful construction of a program architecture, logical flowchart, and highly detailed code modules. Extensive testing and debugging are often necessitated by the complexity of the programming language used. Even the so-called high level languages such as C and C++ require extensive programmer training and a high level of skill in order to successfully program agent behaviors and functionality. The C language is described in xe2x80x9cThe C Programming Languagexe2x80x9d by Brian W. Kernighan and Dennis M. Ritchie (Prentice Hall, Englewood Cliffs, N.J. 1988 (2d Ed.)), incorporated herein by reference in its entirety. The C++ language is described in xe2x80x9cProgramming in C++xe2x80x9d by Stephen C. Dewhurst and Kathy T. Stark (Prentice Hall, Englewood Cliffs, N.J. 1989), also incorporated herein by reference in its entirety.
Thus a shortcoming of the current state of the art is the difficulty in programming the software to control the behavior of an agent. In typical systems, detailed programming in a specialized computer language is required, such as C or object oriented variants thereof or even assembly code. Using these programming languages requires a high level of skill and training in the programmer. Intricate and precisely detailed procedures for creating the code, compiling it, and preparing a machine readable version are also required. In order to make an agent-based system more readily useable and adaptable, it is necessary to be able to create control software with a minimal amount of programmer sophistication. Prior art software programming techniques additionally require a great deal of time to produce working, operational executable code from the programmer""s typed inputs.
As a further drawback, prior art agents can only execute commands available within the application running the agent; they cannot perform actions such as following a hyperlink that are dynamically created by or embedded in an arbitrary document. This latter shortcoming is becoming especially acute as the popularity of the Internet and the World Wide Web (web) continues to increase. Web pages, written in, for example, Hypertext Markup Language (HTML) or a variant thereof, in particular present dynamic links to other pages and new types of content. HTML is described in Ian S. Graham, xe2x80x9cThe HTML Sourcebookxe2x80x9d (1995), incorporated herein by reference in its entirety.
Prior art agent-based interface technology is unable to deal with the dynamic content of web pages and web links (hyperlinks) embedded in other documents because such interface software is unable to understand the contents of the documents. This has the effect of making web navigation and control substantially more frustrating for the inexperienced user, even with a prior art agent-based interface.
The above shortcomings in the prior art are amplified in the case of sightless or seeing-impaired users. While computer systems exist to read plain text aloud, such systems are ineffective on web pages containing objects other than text. One particularly troublesome non-text object is an image map. An image map typically consists of a graphic image that contains regions that are themselves links to different web pages, rather than explicit links. Image maps may consist of icons alone, photographs, or a mixture of text captions and graphic images. However, image maps are downloaded to the browser client in the form of a single graphic image, rather than computer-readable text. As such, text-to-speech programs are unable to sound (read) them to the user.
What is needed is a method of interfacing with an application on a computer system that is friendly and easy to use. What is also needed is a simple and fast method of customizing the behavior of the user interface, to enable rapid prototyping and development. Furthermore, an interface is needed that provides an audible means for interacting with the content of an arbitrary document, such as a web page. The interface needs to provide the ability to navigate hyperlinks embedded in such documents using the user""s voice in addition to standard input means well known in the art.
Presently disclosed is a method of operation of a computer system for processing a document to provide additional information to a user not immediately available from the raw, unprocessed document. This additional information allows a user to navigate and control the content of an arbitrary computer document such as a World Wide Web page using (in one embodiment) voice commands. In one embodiment of the present invention, an agent-based user interface provides both a visual display of previously invisible links embedded in a web page image and an audible indication of the presence of such links. The present invention also provides the ability, in one embodiment, to read the contents of an arbitrary web page document aloud, including the names of embedded hypertext links, thus facilitating web surfing by the sightless.
Furthermore, the present disclosure includes, in one embodiment, a scripting language that allows the rapid creation of executable computer instructions (scripts) for controlling the behaviors and functionality of an agent-based interface. These scripts can be prepared by one of minimal skill in programming and without reference to complex languages such as C or C++, thereby aiding the creation of a friendlier, easy to use interface to the navigation and control features of the present invention.