In the field of computer science, distributed systems have been utilized to allow for faster and more efficient execution of program code that can often prove overly cumbersome and computationally complex for a single stand-alone system to effectively process. A distributed system can refer to a computing mode in which multiple networked computers “work together” by communicating and coordinating their actions to achieve a single result. In the context of computing, distributed systems consisting of multiple computers can work together to execute a single program, thereby spreading the computational burdens across the multiple computers so as to not overly burden any single computer.
The multiple computing resources organized in a distributed system can communicate and coordinate their actions by passing along messages to one another. In an example where multiple computers work together to execute a single program, each computer can perform one or more tasks associated with execution of the program, and can pass messages to another computer in the distributed system, wherein the message can contain information required by the receiver to execute their task within the program.
While distributed systems allow for faster computing speeds by breaking a program down into parts and spreading the computational burden across multiple computers, the process of developing distributed software applications can be difficult because if there is an error in the code, the source of the error may be difficult to ascertain since multiple machines are each running different portions of the overall program and access to the code that each machine is running individually may not be possible or can be cumbersome to debug.
Debugging programs used to debug distributed software often attempt to identify errors in the source code of the software run by each distributed component by employing a sequential debugger for software in each component. Some distributed system software debuggers focus on the communication s between components in the distributed system. These debugging programs, known as replay debuggers, can focus on the communication events between components of the distributed system to detect unintended conditions among the messages or various faults, each of which can provide clues as to the source of the program code error.
Replay debuggers can characterized as belonging to one of two categories: Replay debuggers that replay the execution of the distributed code in its entirety; and replay debuggers wherein only just the messages communicated between components of the distributed system are replayed.
In replay debuggers in which only the messages communicated between components of the distributed are replayed, there has been a long-felt need by programmers to have the ability to focus the replay debugging to a subset of messages either manually or through programmable constraints. Since the execution of a single distributed software program can generated numerous messages between components, providing the developer the ability to focus only on a subset of the messages can be a valuable resource in debugging code.