1. The Field of the Invention
The present invention relates to tracing and profiling distributed applications. More specifically, the present invention relates to systems, methods, and computer-program products for including tracing and/or profiling information along with distributed application data in messages that are utilized by distributed applications during normal operation.
2. Background and Relevant Art
Computer systems and related technology affect many aspects of society. Indeed, the computer system's ability to process information has transformed the way we live and work. Computer systems now commonly perform a host of tasks (e.g. information management, scheduling, and word processing) that prior to the advent of the computer system were typically performed manually. More recently, computer systems have been coupled to one another to form computer networks over which computer systems may transfer data electronically.
Initially, a significant portion of data transfer on computer networks was performed using specific applications (e.g. electronic mail applications) to transfer data files from one computer system to another computer. For example, a first user at a first networked computer system could electronically mail a word processing document to a second user at a second networked computer system. However, program execution (e.g. running the electronic mail application) and data access (e.g. attaching the word processing document to an electronic mail message) were essentially completely performed at single computer system (e.g. the first computer system). That is, a computer system would execute programs and access data from storage locations contained in the computer system. Thus, being coupled to a network would not inherently give one networked computer system the ability to access data from another networked computer system. Only after a user actively sends data to a computer system could the computer system access the data.
However more recently, as the availability of higher-speed networks has increased, many computer networks have shifted towards a distributed architecture. Such networks are frequently referred to as distributed systems. Distributed systems function to “distribute” program execution and data access across the modules of a number of different computer systems coupled to a network.
In a distributed system, modules connected to a common network interoperate and communicate between one another in a manner that may be transparent to a user. For example, a user of a client computer system may select an application program icon from a user-interface thereby causing an application program stored at a server computer system to execute. The user-interface may indicate to the user that the application program has executed, but the user may be unaware, and in fact may not care, that the application program was executed at the server computer system. The client computer system and the server computer system may communicate in the background to transfer the user's commands, program responses, and data between the client computer system and the server computer system.
Often, a distributed system includes a substantial number of client computer systems and server computer systems. In many cases, computer systems of a distributed system may function both as client computer systems and server computer systems, providing data and resources to some computer systems and receiving data and resources from other computer systems. Each computer system of a distributed system may include a different configuration of hardware and software modules. For example, computer systems may have different types and quantities of processors, different operating systems, different application programs, and different peripherals. Additionally, the communications path between computer systems of a distributed system may include a number of networking components, such as, for example, firewalls, routers, proxies and gateways. Each networking component may include one or more software or hardware modules that condition and/or format portions of data so as to make them accessible to other modules in the distributed system.
In some cases, “distributed applications” are specifically designed for execution in a distributed system. Since many distributed systems include a substantial number of modules, the design and configuration of distributed applications is significantly more complex than designing and configuring applications for execution at a single computer system. Each portion of a distributed application, in addition to being configured for proper operation in a stand-alone mode, must also be configured to appropriately communicate with other portions of the distributed application, as well as other modules in associated distributed systems. Due to this complexity, communication between portions of distributed applications (even those that are properly configured) may operate in an undesirable manner from time to time. As such, it is often desirable to gather information from intermediary modules of a distributed system that facilitate communication between portions of a distributed application. Gathering such information is frequently referred to as “tracing” or “profiling” (hereinafter referred to jointly as “profiling”).
One common profiling technique used on distributed systems is to attach, or “glue on,” a profiling process to a portion of distributed application and monitor communication to and from the portion of the distributed application. As communication occurs, the profiling process records communication data to a log file. In some cases, profiling processes are attached to a number of different portions of a distributed application and each profiling process records data to a separate log file. The separate log files are then pulled together and correlated to give some indication of what may be causing undesirable communication between portions of a distributed application.
Attached profiling processes offer little control over profiling functions that are performed and the amount of data that is returned when communication to and from a portion of a distributed application is being profiled. These processes may have standardized profiling operations with limited ability to configure the operations for specific distributed systems. This may result in too much data, some of which may not even be useful for profiling a particular portion of a distributed application, being returned by profiling processes. Lack of control over the amount of data that is returned may result in a “probe effect,” where the amount of data returned is so great that performance of a distributed system is impacted.
Some profiling mechanisms require specialized profiling code to operate. Specialized code may cause a distributed application to report information from different modules of a distributed system back to a profiler. Thus, a profiler may have a better indication of what is causing undesirable behavior. However, the use of specialized profiling code has at least one inherent problem: specialized profiling code is often self-contained and will not interact with other profiling programs. Since specialized profiling code is often incompatible with other profiling programs, different versions specialized profiling code must be individually developed for different distributed applications. This is time consuming and may require substantially technical expertise on the part of a programmer.
Another difficulty in using specialized profiling code is a decreased ability to profile timing interactions between different portions of a distributed application. Only in a deployed system can the vast majority of timing interdependencies be profiled. In most profiling mechanisms that utilize specialized profiling code it is not practical to profile all possible timing combinations that may occur.
Further, modules of a distributed system are often protected by security mechanisms, such as, for example, firewalls that block some types of communication. That is, security mechanisms may be configured so that communications between portions of a distributed application are allowed, but other communications that may be seen as a security risk are blocked by the security mechanisms. Since profiling operations may interact with modules in ways that could be destructive, security mechanisms frequently interpret requests to profile a portion of a distributed application as potentially harmful communications and thus block the communications.
Therefore, what are desired are systems, methods, and computer program products, for more efficiently and accurately profiling distributed applications.