Related fields include electronic digital data processing and, in particular, control of a slave computer by a master computer, remote data access, and remote detection and correction of software faults.
Occasions to control one microprocessor-containing device from another are plentiful and increasing. Enterprise information technology (IT) groups in any industry can greatly increase their efficiency and reduce their companies' overhead if they can install, update, and troubleshoot software on servers without needing to physically visit each server. Technical support for field-installed software and hardware becomes more cost-effective when problems can be diagnosed and fixed remotely, computer-to-computer. Insufficient network connectivity is no longer a primary obstacle to efficient control of remote devices, because both public and private networks have become increasingly ubiquitous, especially in urban areas. Instead, the diversity of operating systems and device technology has become a dominant obstacle.
One approach to the device-diversity problem is to provide some mechanism to configure the remote devices. Such mechanisms have included standardized network protocols such as NETCONF®, software development kits for the various platforms such as Amazon's AWS SDK®, and graphical desktop-sharing systems such as VNC® that enable control of a computer with one OS by another computer with a different OS. Enterprise IT departments typically exercise central control over remote devices through platform-specific application programming interfaces (APIs) provided by the device vendors. Each new device platform, needing to be addressed through its own API, adds cost to the central management system. As new devices and platforms enter the market at an increasing rate, the problem becomes exacerbated. Because of the labor and risk involved in adding another platform that IT will need to support, many enterprises are reluctant to adopt new devices or programs that could potentially boost productivity, Instead, they strive to limit the diversity of hardware and software in use.
Software bugs cost the U.S. economy $60 billion per year, according to a 2003 study by the U.S. National Institute of Standards and Technology (NIST), despite software companies' typically allocating 50% of development costs to testing. The most common approach to software testing is to run a large number of focused test cases. These test cases could be executed faster and more consistently, and cover more of the program under test, if they were automated. Often, however, the automation itself takes a prohibitively long time; time to create tests that thoroughly cover the scope of operation, and time to maintain the tests as new versions of the software are created.
As diverse as the underlying programs for different device operating systems may be, the graphical user interfaces (GUIs) have been converging on at least some degree of similarity. Manufacturers have found that many customers are more willing to use a new device if the interface looks and feels at least somewhat familiar. Among computers, and increasingly among smaller devices their processors and screens have become able to support it, the “desktop” GUI has gained wide acceptance. The device's display simulates the surface of a desk with various objects on it: folders, books, calendars, clocks, and graphic icons identifying programs, functions, or peripheral devices such as connected printers. By entering an input (typing, clicking touching, gesturing, speaking, etc.) directed at one of these objects, a “folder” can be opened to show its contents; an appointment can be entered on the “calendar”; a calculation can be done on the “calculator”; a program can be started by selecting its icon, and so on. Among simpler devices such as microprocessor-enhanced household or public appliances, “control-board” GUIs are still seen; the user “presses” text-labeled or pictorial buttons to perform an action or expose a new selection of options. In some cases, simple text or numeric values can be entered. As the GUI interface becomes more ubiquitous, software manufactures take pains to ensure that their new devices support a GUI, and further that the GUI is not too dissimilar from established GUIs, so that users adopt the new devices easily.
This increasing ubiquity of the GUI as the device interface offers an opportunity for controlling remote devices without the costly traditional API programming approach by interacting with devices the way a human user would: analyzing the on-screen display and interacting with the visual entities shown. If the complexity and consequent slowness of analyzing and manipulating the image data could be overcome, automated operations such as control of remote devices and software testing could be carried out on multiple platforms with identical, or nearly identical, scripts. Therefore, a need exists to interact with different display-based user interfaces (graphic user interfaces, text “console” interfaces, etc.) in a unified way, largely agnostic to the operating system or other programmatic nuances of the target device being controlled.