I. Graphical User Interface
On many modern computer systems, users interact with software programs through a graphical user interface (“GUI”). Basically, a GUI is an interface between computer and user that uses pictures rather than just words to solicit user input and present the output of a program. The typical GUI is made up of user interface elements (“UI elements”), which are those aspects of a computer system or program that are seen, heard, or otherwise perceived or interacted with by a user. For example, UI elements include items such as icons, buttons, dialog boxes, edit boxes, list boxes, combo boxes, scroll bars, pick lists, pushbuttons, radio buttons and various components of World Wide Web pages (e.g., hyper-links and images). In a typical computer program it is common to encounter literally thousands of UI elements.
Although an individual element of a GUI may appear to the user as a single item, it may actually consist of a number of separate items or sub-elements that have been combined together. For example, a toolbar item may consist of a list element, a combo box element, a scroll bar element, etc. Furthermore, each of these sub-elements themselves may be composed from other sub-elements. In this manner, UI elements can serve as building blocks for building other, more complex, UI elements. Such an approach is useful because the software managing the user interface can re-use the definitions of certain common elements when assembling them into composite elements.
Many UI elements in a GUI environment represent features of a program and are displayed on the computer screen so users can interact with the program by selecting, highlighting, accessing, and operating them. This user interaction is done by maneuvering a pointer on the screen (typically controlled by a mouse or keyboard) and pressing or clicking buttons while the pointer is pointing to a UI element. For example, in a word processor, a user can maneuver the mouse to select an item on the program's menu bar, to click an icon from the tool bar, or to highlight blocks of text in the viewer window. Similarly, a user can use keyboard input to interact with a computer application. For instance, in the word processing program, a user can press “ALT-F,” “CTRL-B,” or other predefined keystroke combinations to access program features. Based on the input from the mouse or keyboard, the computer adds, changes, and manipulates what is displayed on the screen. GUI technologies provide convenient, user-friendly environments for users to interact with computer systems.
II. UI Automation
UI Automation (“UIA”) is an accessibility framework for Microsoft Windows intended to address the needs of assistive technology products and automated test frameworks by providing programmatic access to information about the user interface (“UI”). For example, UIA allows a screen reader program to access information about the UI of a word processor, providing the reader program with the information it needs to provide audible cues to a visually impaired user. Through an application programming interface (“API”) set of methods, UIA provides a well-structured mechanism for creating and interacting with a UI. Control and application developers use the UIA API set to make their products more accessible to different users through existing or new software (potentially written by other people) to access program menus and other UI elements. For example, braille screens, screen readers (narrators), magnifiers, and other software in Microsoft Windows can use UIA to facilitate computer use for users who otherwise may not have access.
In practice, UI Automation uses a hierarchy of UI elements located in a tree structure to provide reliable UI information to the operating system and computer applications. Elements of a GUI can be considered as nested within each other in order to accurately describe their organization. For example, at the very top of a tree structure is a desktop element that is representative of the GUI's “desktop” or default background area. The desktop element has within it several application elements for application programs that have been invoked and that are ready to execute according to a user's instructions (e.g., a typical Microsoft Windows desktop may have several instances of applications such as Microsoft Word, Microsoft Excel, etc. loaded and ready to execute). At a lower level in the tree structure hierarchy are frames associated with an application (e.g., a word processor application may have several frames visible to a user at any given time). Within each of the frames may be several documents, each document containing within it several UI elements (e.g., buttons, listboxes, etc.). UI elements may themselves be composites of other UI elements. For example, a dialog box or a combo box contains other UI elements such as a button control. Furthermore, the button element may contain yet other UI elements. Such nesting can be arbitrarily deeper and include an arbitrary number of branches depending on the user interface and its component elements.
For some operating system platforms, an instance of a UI element is assigned an identifier to help distinguish that particular UI element from other UI elements. For example, in a Microsoft Windows based operating system, applications are associated with module identifiers that identify applications within a given desktop context. Also, some user interface platforms (e.g., Microsoft Windows, Swing for Java) use a numeric identifier (control ID) for certain UI elements. In some computing environments, such as a Microsoft Windows environment, UI elements are often associated with a class name associated with the control class to which they belong. For instance, in a Microsoft Windows based system, common UI elements such as combo box, list box, and button are associated with class names such as ComboBox class, ListBox class, and Button class, respectively. Similarly, other UI frameworks may have names for their respective classes of UI elements.
Notably, these techniques identify a UI element's object class or type, but do not singularly provide a strong identifier that uniquely identifies a UI element across a reboot of the computer running the program, across a different build of the program when still in development, across the opening of an another instance of the same program, or for opening of the same program on another computer.
UIA overcomes these deficiencies by generating a composite ID that uniquely identifies a UI element in a GUI tree. UIA generates the composite identifier by adding identification information (e.g., control name or control type) that is directly associated with a UI element to hierarchical identification information (e.g., parent control, child control, and/or sibling controls) and control pattern-specific information (e.g., depth of the UI element in the tree). For example, an identifier for a target UI element may be generated by collecting identifying information related to parent UI elements that describe the hierarchical arrangement between a target leaf UI element and the root element at desktop. Through the concept of a path, the related identifiers for a UI element's unique hierarchy and parentage can be leveraged to identify it uniquely and persistently.
The unique identifier (persistent ID) provides easy access to individual UI elements so that the functionality of a program hosting UI elements can be programmed and tested, and so that a particular UI element can be identified to other program modules. For additional information about UIA, see, for example, the documentation available through the Microsoft Developer Network.
III. Macros Builders and UI Recorders
The ability to record and playback a user's interaction with a computer in a GUI environment has the potential to benefit multiple parties, including businesses (or other large organizations), users, software developers, testers, and computer support personnel. For example, business organizations can streamline a business process, such as use of software for supply-chain management, by automating much of the process. Users benefit by creating macros or scripts that combine a series of inputted actions into a playback action triggered in a single step. Software developers can use the ability to record user actions to help generate test cases for software under development. Testers can use record and playback tools to build tests to perform automated regression testing. Computer support personnel can record user actions to discover the reason for computer crashes or hangs, or to help users understand how to use software.
A. Commercial Macro Builders
Many conventional macro builder programs generate scripts that show the internal commands and actions taken by the computer or application to perform a function. However, in many instances, users must independently develop scripts based on a set of scripting commands and complex programming constructs. Thus, users have to understand programming logic and, to some extent, the underlying logic of the programs being controlled to create and use a macro. For example, AutoMate, a macro program, uses a drag-and-drop task builder to create a macro script by dragging and dropping specific steps into the order they should be executed, which means the AutoMate user has to understand how and in what order commands should be issued. Macro Scheduler is a macro creation program that allows a user to write a macro script using more than 200 script commands and programming constructs (not including actual declared variables and other user-defined structures). The complexity required to create and edit the scripts generated by these macro programs and the fact that the scripts generated by these macro builders do not represent an actual step-by-step readable recording of UI activity lessen those macro programs' usefulness, particularly to novice users and to computer support personnel and software developers attempting to troubleshoot problems.
B. Internal Macro Languages
As an alternative to commercial macro building software, many applications have the built-in ability to record and playback macros using their own special-purpose application control language (“macro language”). An application's macro language is typically unique to the application and is generally based on the application's internal object model (“IOM”). While it is possible to build a macro language without object-oriented techniques, most internal models use an object-oriented representation of the structure of the program. The IOM provides an accessible outline or model of the classes, attributes, operations, parameters, relationships, and associations of the underlying objects for the program. Macro languages access their application's IOM and hook into its communications mechanisms (such as event calls) to access and call features within the application. For example, a user of Microsoft Word can record a macro to automatically format text. A main drawback of most macro languages, however, is that they are application specific. A macro recorded by one application generally is not supported by other applications, particularly if the two applications were developed by competing software companies. In some cases, a macro recorded for one version of an application is not supported by later versions of the application.
C. Dependence on Macros
Developing macros can be difficult, and many users and businesses are reluctant to do anything that might break their existing macros. Businesses in particular are reluctant to do anything that might negatively affect their business processes. In fact, many users and businesses are reluctant to upgrade software or even install patches for fear of “breaking” something. For example, suppose a business employs an automated business process (e.g., in the form of a macro) that scans received faxes, automatically performs optical character recognition (“OCR”) on the fax to produce a text file version of the fax, compares the data in the text file to entries in a spreadsheet to verify account information, and finally sends a confirmation email to the sender of the fax. This business process most likely uses a combined set of potentially complicated macros and a variety of software packages to function properly (e.g., a scanner program, an OCR program, a spreadsheet program, a text file program, etc.). Businesses and users are often apprehensive about upgrading macros or software unless they have assurances that current investments into their automated processes will remain intact.
D. Other UI Recorders
Many conventional UI recorders have similar drawbacks to those of the macro tools described above in that they use complex scripting commands and programming constructs to represent data. Another drawback is that conventional playback is very dependent on the recording computer's pre-existing conditions. For example, playback may depend on a certain hardware configuration, software installation, and/or the dynamic state of the runtime environment (such as the availability or location of a UI element for a particular recorded interaction). Using a conventional playback tool, any changes to those pre-existing conditions can cause playback to fail.
As an example, suppose a user reconfigures an application user interface. Some GUI-based applications allow a user to move buttons, reconfigure menus, add or remove other UI elements for program features, add links to macros on a menu bar, or perform other UI alterations. Although the ability to personalize menus is a useful feature (e.g., enabling users to customize a UI to their specific needs), it may cause many recording and playback tools to fail because UI elements are not in their expected locations.
For example, the Microsoft Windows 3.1 Macro Recorder attempted to replicate user actions in a GUI by recording mouse movements and mouse coordinates when a mouse button was clicked. On playback, if a UI element was not where it was expected to be, playback failed. For additional information about macro recording in Microsoft Windows 3.1, see, for example, the reference entitled, User's Guide for Microsoft Windows for Workgroups, at page 137. As another example, suppose a user records UI activity at a low monitor/desktop resolution. Later, the user changes to a higher monitor/desktop resolution. In this case, as above, playback would most likely fail because the screen position of the UI elements has changed. Hence, as the computer environment changes playback becomes increasingly unreliable.
In conclusion, there is a need for simple, system-wide macro and UI recording tools that are compatible with existing macro languages and application-specific macro recorders. At the same time, there is a need for simple, system-wide macro and UI recording tools that work with dynamic user interfaces.