In the field of computer science, distributed systems have been utilized to allow for faster and more efficient execution of program code that can often prove overly cumbersome and computationally complex for a single stand-alone system to effectively process. A distributed system can refer to a computing mode in which multiple networked computers “work together” by communicating and coordinating their actions to achieve a single result. Distributed systems can be bus based or each individual computing node can be networked to the other computing nodes in the distributed system. In a bus-based system, the components send messages to each other through the bus, by broadcasting the messages to the bus such that every node of the system attached to the bus receives the message. In the context of computing, distributed systems consisting of multiple computers can work together to execute a single program, thereby spreading the computational burdens across the multiple computers so as to not overly burden any single computer.
The multiple computing resources organized in a distributed system can communicate and coordinate their actions by passing along messages to one another. In an example where multiple computers work together to execute a single program, each computer can perform one or more tasks associated with execution of the program, and they can pass messages to another computer in the distributed system, wherein the message can contain information required by the receiver to execute their task within the program.
While distributed systems allow for faster computing speeds by breaking a program down into parts and spreading the computational burden across multiple computers, the process of developing distributed software applications can be difficult because if there is an error in the code, the source of the error may be difficult to ascertain since multiple machines are each running different portions of the overall program, and access to the code that each machine is running individually may not be possible or can be cumbersome to debug.
Debugging programs used to debug distributed software often attempt to identify errors in the source code of the software run by each distributed component by employing a sequential debugger for software in each component. Some distributed system software debuggers focus on the communications between components in the distributed system. These debugging programs, known as replay debuggers, can focus on the communication events between components of the distributed system to detect unintended conditions among the messages or various faults, each of which can provide clues as to the source of the program code error.
Replay debuggers can be characterized as belonging to one of two categories: replay debuggers that replay the execution of the distributed code in its entirety and replay debuggers, wherein only the messages communicated between components of the distributed system are replayed.
In replay debuggers in which only the messages communicated between components of the distributed system are replayed, there has been a long-felt need by programmers to have the ability to focus the replay debugging on a subset of messages either manually or through programmable constraints. Since the execution of a single distributed software program can generate numerous messages between components, providing the developer the ability to focus only on a subset of the messages can be a valuable resource in debugging code.
The execution of a distributed software program may generate thousands upon thousands of messages between components of a distributed system. Thus, if a programmer or developer was seeking to determine when a particular condition in a message occurred, they would ordinarily be required to sift through each and every message generated during execution of the replay debugger to see if the condition occurred. This process can prove to be extremely labor-intensive, further adding to the time and complexity required to perform replay debugging.