I. Graphical User Interface (GUI)
On many modern computer systems, users interact with software programs through a graphical user interface (GUI). Basically, a GUI is an interface between computer and user that uses pictures rather than just words to solicit user input and present the output of a program. The typical GUI is made up of user interface elements (UI elements), which are those aspects of a computer system or program that are seen, heard, or otherwise perceived or interacted with by a user. For example, UI elements include items such as icons, pushbuttons, radio buttons, dialog boxes, edit boxes, list boxes, combo boxes, scroll bars, pick lists, and various components of World Wide Web pages (e.g., hyper-links and images). In a typical computer program it is common to encounter literally thousands of UI elements.
Although an individual element of a GUI may appear to the user as a single item, it may actually consist of a number of separate items or sub-elements that have been combined together. For example, a toolbar item may consist of a list element, a combo box element, a scroll bar element, etc. Furthermore, each of these sub-elements themselves may be composed from other sub-elements. In this manner, UI elements can serve as building blocks for building other, more complex, UI elements. Such an approach is useful because the software managing the user interface can re-use the definitions of certain common elements when assembling them into composite elements.
Many UI elements in a GUI environment represent features of a program and are displayed on the computer screen so users can interact with the program by selecting, highlighting, accessing, and operating them. This user interaction is done by maneuvering a pointer on the screen (typically controlled by a mouse or keyboard) and pressing or clicking buttons while the pointer is pointing to a UI element. For example, in a word processor, a user can maneuver the mouse to select an item on the program's menu bar, to click an icon from the tool bar, or to highlight blocks of text in the viewer window. Similarly, a user can use keyboard input to interact with a computer application. For instance, in the word processing program, a user can press “ALT-F,” “CTRL-B,” or other predefined keystroke combinations to access program features. Based on the input from the mouse or keyboard, the computer adds, changes, and manipulates what is displayed on the screen. GUI technologies provide convenient, user-friendly environments for users to interact with computer systems.
II. UI Automation
UI Automation (UIA) is an accessibility framework for Microsoft Windows intended to address the needs of assistive technology products and automated test frameworks by providing programmatic access to information about the user interface (UI). For example, UIA allows a screen reader program to access information about the user interface of a word processor, which gives the reader program the information it needs to provide audible cues to a visually impaired user. Through an application programming interface (API) set of methods, UIA provides a well-structured mechanism for creating and interacting with a UI. Control and application developers use the UIA API set to make their products more accessible to different users through existing or new software (potentially written by other people) to access program menus and other UI elements. For example, braille screens, screen readers, narrators, and other software in Microsoft Windows can use UIA to facilitate computer use for users who otherwise may not have access.
In practice, UI Automation uses a hierarchy of UI elements located in a tree structure to provide reliable UI information to the operating system and computer applications. FIG. 1 illustrates one such tree structure representation 100 of a typical GUI on a computer. More particularly, FIG. 1 illustrates how elements of a GUI can be shown as nested within each other in order to accurately describe their organization. At the very top of the tree structure 100 is a desktop element 101 that is representative of the GUI's “desktop” or default background area. The desktop element 101 has within it several application elements (e.g., 105A, 105B and 105C) for application programs that have been invoked and are ready to execute according to a user's instructions (e.g., a typical Microsoft Windows desktop may have several instances of applications such as Microsoft Word, Microsoft Excel, etc. loaded and ready to execute). At a lower level in the tree structure hierarchy are several frames (e.g., 110A, 110B and 110C) associated with an application 105B (e.g., a word processor application may have several frames visible to a user at any given time). Within each of the frames (e.g., 110B) may be several documents (e.g., 115A and 115B), each document containing within it several UI elements. Document B 115B contains controls 120A, 120B and 120C (buttons, listboxes, etc.). UI elements (e.g., 120A, 120B and 120C) may themselves be composites of other UI elements. For example, the element 120B (such as a dialog box, or a combo box) in turn may contain other UI elements such as a button control at 125. Furthermore, the button element 125 itself may contain other UI elements such as control 130. Such nesting can be arbitrarily deeper and include an arbitrary number of branches depending on the user interface and its component elements.
For some operating system platforms, an instance of a UI element is assigned an identifier to help distinguish that particular UI element from other UI elements. For example, in a Microsoft Windows based operating system, applications are associated with module identifiers that identify applications within a given desktop context. Also, some user interface platforms (e.g., Microsoft Windows, Swing for Java) use a numeric identifier (control ID) for certain UI elements. In some computing environments, such as a Microsoft Windows environment, UI elements are often associated with a class name associated with the control class to which they belong. For instance, in a Microsoft Windows based system, common UI elements such as combo box, list box, and button are associated with class names such as ComboBox class, ListBox class, and Button class, respectively. Similarly, other UI frameworks may have names for their respective classes of UI elements.
Notably, these techniques identify a UI element's object class or type, but do not singularly provide a strong identifier that uniquely identifies a UI element across a reboot of the computer running the program, across a different build of the program when still in development, across the opening of an another instance of the same program, or for opening of the same program on another computer.
UIA overcomes these deficiencies by generating a composite ID that uniquely identifies a UI element in a GUI tree. UIA generates the composite identifier by adding identification information (e.g., control name or control type) that is directly associated with an UI element to hierarchical identification information (e.g., parent control, child control, and/or sibling controls) and control pattern-specific information (e.g., depth of the UI element in the tree). For example, in FIG. 1, an identifier for target UI element 130 may be generated by collecting identifying information related to parent UI elements (e.g., 101, 105B, 110B, 115B, 120B, and 125) that describe the hierarchical arrangement between a target leaf UI element (130) and the root element at desktop 101. Through the concept of a path, the related identifiers for a UI element's unique hierarchy and parentage can be leveraged to identify it uniquely and persistently.
The unique identifier (persistent ID) provides easy access to individual UI elements so that the functionality of a program hosting UI elements can be programmed and tested, and so that a particular UI element can be identified to other program modules.
For additional information about UIA, see, for example, the documentation available through the Microsoft Developer Network.
III. UI Recording and Playback
The ability to record and playback a user's interaction with computer applications in a GUI environment has the potential to benefit multiple parties, including users, software developers, and computer support personnel. For example, users benefit by creating macros or scripts that combine a series of inputted actions into a single step playback action. Software developers can potentially use the ability to record user actions to help generate test cases for software under development. Computer support personnel can record user actions to discover the reason for computer crashes or hangs, or to help users understand how to use software. Conventional UI recorders, such as a macro recorder, have attempted to provide some of these benefits to users. However, current recording tools have drawbacks that limit their usefulness.
A main drawback to many conventional macro programs (which are different from a UI recorder that actually records user actions) is that the scripts they generate do not represent an actual step-by-step readable recording of UI activity (e.g., user actions against a UI-based application). Furthermore, these generated scripts often miss some important input steps such as the expansion of selected pop-up menus. The generated scripts show the internal commands and actions taken by the operating system or application to perform a certain function, but they do not show in a meaningful way actions actually performed by the user. Moreover, in many instances, users must independently develop scripts based on a set of scripting commands and complex programming constructs. Thus users have to understand programming logic and, to some extent, the underlying logic of the programs being controlled to create and use a macro. For example, AutoMate, a macro program, uses a drag-and-drop task builder to create a script by dragging and dropping specific steps into the order they should be executed. As another example, consider Macro Scheduler; it is a macro creation program that allows a user to write a macro script using its more than 200 script commands and programming constructs (not including actual declared variables and other user-defined structures). The complexity required to create and edit the scripts generated by these macro programs and the fact that they do not show actual user input lessens those macro programs' usefulness, particularly to novice users and to computer support personnel and software developers attempting to troubleshoot problems.
Other macro creating programs, such as Borland's Superkey, let users create keyboard macros, rearrange the keyboard, and encrypt data and programs. However, they focus just on keyboard input, rather than recording user activity and representing it in a meaningful way.
Many conventional UI recorders have similar drawbacks to those described above, in that they use complex scripting commands and programming constructs to represent data. On the other hand, one drawback to conventional UI recorders that attempt to record actual UI activity is that they record extraneous data that makes output hard to read. Basically, conventional recording tools record everything that happens on the computer, including activity that is irrelevant to the particular task that is the subject of the recording. For example, this irrelevant activity may include Windows messages, unrelated API calls, and extraneous mouse movements. Recording all this extra data floods the recording pipe with actions and messages that are unnecessary for playback and, ultimately, makes the recorder output difficult to read and understand.
Another drawback to conventional recorders relates to problems with the context for playback of recorded data, and these problems are exacerbated by the presence of irrelevant data as described above. A conventional playback component simply plays back the recorded data exactly as the data were recorded. This makes playback inefficient and, in many case, causes playback to simply fail. Part of the reason for failure is that conventional playback is very dependent on the recording computer's pre-existing conditions. For example, playback may depend on a certain hardware configuration, software installation, and/or the dynamic state of the runtime environment (such as the availability or location of a UI element for a particular recorded interaction). Using a conventional playback tool, any changes to those pre-existing conditions may cause playback to fail. For instance, suppose a user records UI activity at a low monitor/desktop resolution. Later, the user changes to a higher monitor/desktop resolution. In this case, playback would most likely fail because the screen position of all the UI elements has changed. Similarly, playback may fail because a call to an unrelated API was recorded and on playback the API does not exist. Hence, as the computer environment changes more, or as larger volumes of irrelevant data are recorded, playback becomes increasingly unreliable.
As another example, suppose user actions are recorded for the following simple activities: the user mouse-clicks a drop down button of a combo box to show a list box with names of states, scrolls through the list box to by moving the thumb wheel of a mouse, and mouse-clicks on an item such as the state “Washington.” When the steps are played back exactly as they were recorded, there may not be the same number of items in the list box, so moving the thumb wheel the same way as recorded may result in selection of a different item. The recorded wheel movement may not even make the item that needs to be selected visible. Or, the item may not even exist in the list box (e.g., if the combo box has an edit control). Finally, if filling the list box takes time, then a synchronization problem may arise. Recording conditions are critical to reliable playback and any changes in those conditions can cause playback to fail.
As noted above, conventional UI recorders frequently do not support synchronization mechanisms, which, depending on the workload of the computer being recorded, the workload of the playback computer, as well as other factors, can cause playback to fail. For example, a user records the necessary steps to launch an application. On playback, the tool automatically tries to run each recorded instruction within a certain amount of time based on the recorded timeframe. If a step fails to finish within the predetermined timeframe, it can cause an entire chain of steps to fail when subsequent steps rely on predecessor steps.
For instance, in a Microsoft Windows 3.1 environment, suppose a recording tool records the series of steps necessary to launch a word processor and open a file in it. On subsequent playback, if the application takes longer than expected to launch, a conventional playback tool would try to open a file before the application is ready, and playback would fail. For additional information about macro recording in Microsoft Windows 3.1, see, for example, the reference entitled, User's Guide for Microsoft Windows for Workgroups, at page 137.
In conclusion, these prior recording and playback systems provide unreliable techniques for recording and playing back user actions on a computer. Thus, there is a need in the art for tools that record user actions performed against UI-based applications and selectively filter and adjust data to provide more reliable playback. There is also a need in the art for tools that provide more readable output so recorded data can be read, interpreted, and edited according to user needs. Further, there is a need in the art for tools that address the synchronization problems of past recording technologies. These and other advantages may be achieved by the tools and techniques described herein.