Individuals interact with computers through a user interface. The user interface enables a user to provide input to and receive output from the computer. The output provided can take on many forms and often includes presenting a variety of user-interface elements, sometimes referred to as “controls.” Exemplary user-interface elements include toolbars, windows, buttons, scrollbars, icons, selectable options, graphics that compose controls (such as images, text, etc.) and the like. Virtually anything that can be clicked on or given the focus falls within the scope of “element” as used herein. Information related to user-interface elements is often requested by assistive-technology products so that the products can enhance a user's computing experience.
Assistive-technology products are specially designed computer programs designed to accommodate an individual's disability or disabilities. These products are developed to work with a computer's operating system and other software. Some people with disabilities desire assistive-technology products to use computers more effectively.
Individuals with visual or hearing impairments may desire accessibility features that can enhance a user interface. For example, individuals with hearing impairments may use voice-recognition products that are adapted to convert speech to sign language. Screen-review utilities make on-screen information available as synthesized speech and pairs the speech with visual representations of words in a format that assists persons with language impairments. For example, words can be highlighted as electronically read. Screen-review utilities convert text that appears on screen into a computer voice.
To provide supportive features to persons that desire to use them, assistive-technology applications do not have access to the same code that native applications are able to use. This is because an assistive-technology application works on behalf of a user; instead of the user working directly with the user interface—as is the case in native applications. For instance, if a word-processing application wishes to display text to user, it can easily do so because the word-processing application knows what program modules to call to display the text as desired. But a screen reader—an application that finds text and audibly recites the text to a user—is unaware of much of a target application's programmatic code. The screen reader must independently gather the data needed to identify text, receive it, and translate it into audio.
Assistive-technology applications work under a variety of constraints. To further illustrate a portion of the constraints that assistive-technology applications are subject to, consider, for example, an application that needs to display the contents of a listbox. This would be an easy task for a native application because it would know where the relevant list-box values are stored and simply retrieve them for display. But an assistive-technology application does not know where the values are stored. It must seek the values itself and be provided with the necessary information to display the values. Thus, assistive-technology applications must function with limited knowledge of an application's user interface.
The difficulties associated with an assistive-technology application performing certain functions on all types of user-interface elements is somewhat akin to the difficulties that would be faced by a person asked to be able to program any type of VCR clock simply by providing access to the VCR clock. Unlike the VCR owner who is familiar with his VCR's clock and has the VCR manual, the fictitious person here has no foreknowledge of what type of VCR he may come across, what type of actions are necessary to program the clock, whether it will be a brand ever seen before, or the means of accessing its settings—which may be different from every other VCR previously encountered. Moreover, expecting the person to know about every type of VCR is an unrealistic proposition. As applicable to the relevant art, it is an unrealistic proposition to expect every requesting component to know about every type of listbox that it might encounter. Programming such a requesting component would be an expensive and resource intensive process.
One way a user interface may provide this information is by using logical hierarchal structures. A significant problem in the art, however, is that logical hierarchal structures provided by a user interface often do not have the requisite level of granularity needed by an assistive-technology application. Without the benefit of an adequate description of a UI or knowing the contents of certain data elements (such as listboxes, combo boxes, and many others), assistive-technology applications must request this information from the user interface to be able to manipulate or otherwise make use of the data.
Although requesting components such as assistive-technology applications can provide various user-interface customizations if they can receive accurate data regarding the user-interface elements, providing accurate information regarding user-interface elements has proven difficult. This difficulty stems from the fact that no single entity knows all the relevant information about any particular piece of a user interface. For example, although a list-box component may itself know the individual list-box items contained within it, only the name of the listbox may be known by its parent dialog window. Although a user interface or portion of a user interface may be depicted as a hierarchal structure such as a tree, a single tree may only provide limited information, which can prevent an assistive-technology application from functioning properly.
A user interface is typically composed of elements from various different platforms in various different processes, complicating interaction with the UI. A platform is a suite of APIs, libraries, and/or components that comprise building blocks of an operating system. A first exemplary platform is the “WIN32” platform, which uses HWNDs as a basic element type. A second illustrative platform is HTML, which uses HTML elements to compose a platform. Other illustrative platforms include those used to develop a Linux or Macintosh® user interface. These platforms often have incompatible APIs. For example, HTML uses a first platform to build its user interface, but controls in a WIN32 environment use another platform to build their UI. These disparate UI platforms live as a collection of disjointed trees, a scheme which is difficult for client applications (or requesting applications) to interact with. The UI of an application can be illustrated as a set of UI elements that are arranged in a hierarchy that typically indicates containment (although HTML allows child elements to be positioned on the screen outside of the bounds of parent elements). For example, a desktop may contain multiple application windows, one of which may contain a title bar, scrollbars, controls, which may include a list control, which may in turn contain list items, which may still further contain text and images. We note that the term “desktop” is commonly associated with an aspect of the Windows® operating system produced by Microsoft Corporation of Redmond, Wash., but we do not mean to associate such a narrow definition to the term as used herein. Rather, “desktop” is a term that we will often refer to as representing the highest level of a hierarchal tree. Other operating systems, such as Linux; the Mac OS™ offered by Apple Computer, Inc. of Cupertino; Calif.; the Solaris™ Operating System offered by Sun Microsystems, Inc. of Santa Clara, Calif.; and other operating systems have work spaces that represent the top-most level of a user interface. It is that upper-most level of interest, which may not necessarily be the top level, that we intend to describe as the term “desktop” is used throughout this disclosure.
As previously mentioned, the system that manages a particular set of elements is referred to as a platform. Exemplary functions performed by platforms include allocating and subdividing screen real estate (for example, deciding where a list box should be placed and ensuring that its drawing does not interfere with other elements); routing input (such as mouse clicks and keyboard presses) to correct elements; and managing basic UI-related state for an element (such as focus, enabled, location, and the like).
Also, any control that manages screen real estate and/or input can be regarded as a platform. For example, a list box is limited in functionality, but it does manage the location of its list items, and it also manages input on their behalf. Accordingly, such an item falls within the meaning of “platform” as used herein.
Because the different platforms all use different interfaces to obtain information about their underlying elements, they are generally incompatible. That is, code written to retrieve information associated with a child of a node in a first application would be different than code that retrieves a similar topological node in a different platform. Developers often use different platforms for different reasons. Some platforms are better suited to carry out various functions than are other platforms. When multiple platforms are used within an application, it is often the case that the platforms are not explicitly aware of how they are connected. For example, a list box (a WIN32 element) within a table in a Web page (HTML elements) has no knowledge that it is within the table.
Still further compounding the problem associated with a requesting component interacting with various UI elements is the fact that platforms typically store information within the process that is displaying the UI. For example, in a calculator application, the element tree structure may be contained entirely within the calculator process. As will be explained in greater detail below, crossing process boundaries can negatively impact system performance. As previously mentioned, tools, applications, and other requesting components that wish to access a UI to obtain information about it or to interact with it has historically had to deal with at least the following exemplary problems: maintaining awareness of multiple incompatible platforms, crossing process boundaries to retrieve information about different user interfaces, and being aware of transitions from one platform to another to hopefully enable navigation between user interfaces that are composed of multiple disjoint subtrees. A developer faced with addressing such problems faced a formidable task to develop a requesting component that could richly interact with UI elements of various user interfaces.
Another significant shortcoming of the prior art is the lack of flexibility that a client application or other requesting component has with respect to viewing a tree that represents elements of a user interface. A tree that represents all elements of a user interface may be referred to as a raw tree. This raw tree, according to the present invention described below, can include levels of granularity never before possible. But a requesting client may not need such level of granularity. For instance, a client may only be interested in receiving information associated with UI elements that can receive user input. Or perhaps a requesting component desires to navigate to some next node that satisfies a condition, such as having a specific name. The prior art does not allow for the submission of any such condition to a platform. Absent the present invention, a requesting client application is at the mercy of receiving uncustomized views of representations of user-interface elements.
There is a need in the prior art to free such requesting components from having to deal with the aforementioned issues and thereby enable them to carry out their domain-specific work. That is, the current state of the prior art could be improved by presenting to requesting components a uniform view of a user interface that includes a single unified tree of elements that allows clients to access the structure of a desired user interface, such that elements in such a structure can then provide access to information about the UI corresponding to a node of interest (properties), access to functionality of the user interface (patterns), and/or access to notifications concerning changes in that user interface (events). Moreover, the present state of the art could be improved by providing a set of predefined views of a raw tree. For example, filtering a raw tree according to a set of conditions provided by a requesting component.
A shortcoming exists in the current state of the art whereby providing information described by two or more logical trees is either impossible or inordinately difficult. For example, a user interface may have three windows with a button and a listbox in one of the three windows. Information about the user interface (the three windows for example) may be contained in a first tree while information about the contents of the listbox may be described by a second tree. In such a situation, an assistive-technology application that requires knowledge of both trees must try to derive this information itself, which is difficult. There is no efficient way to represent the two or more trees to the requesting component as a single hierarchal representation. Accordingly, there is a need for a method and system for providing accurate, comprehensive hierarchal-structure information about user-interface components described in two or more logical trees to a requesting application. Moreover, there is a need to provide the information in a format that is easy for the requesting application to process.