The software development process leaves a vast trail of data behind it documenting the online activity of developers, testers, project managers, and other participants. The vast store of historical artifacts such as, old emails, bug reports, work items, check-in messages, specs, and the like, provide valuable, but mostly-untapped, resources for answering crucial questions about the project, such as, Why was this code written this way? Are there known problems in this code? Why did the build break? Why did this bug reach our customers?
Investigating questions like these are sometimes called sensemaking, which may be defined as “the process of searching for a representation and encoding data in that representation to answer task-specific questions”. Sometimes the answers to the questions can be found in a single artifact. When this is the case a good search tool over the relevant store could help the user to find the crucial artifact. However it's common that the answer is spread across many artifacts in the data trail.
When the answer is not found in a single artifact, the user must then explore many artifacts, understand the relationships among them and piece together the answer from multiple bits of evidence. This suggests that a simple search tool is not enough. What is needed is a sophisticated interface for exploring a collection of related artifacts.
One solution that relies heavily on this kind of in-depth exploration of software artifacts is root-cause analysis, or RCA, which is the process of finding the reasons for failures in the software development process. I general, an RCA culminates in a report documenting the chain of events that contributed to the failure and suggesting solutions to prevent such failures in the future.
Normally, the framework that is used for the sensemaking process is a word-processor document containing a chronological list of artifacts related to the failure. In some cases, the document also includes text snippets from the artifacts and annotations interpreting the artifacts. In a typical RCA investigation chronology, there may be hundreds of entries in the document.
Deleteriously, the investigator discovers the source material by laboriously searching the repositories containing artifacts of potential interest. In many cases the process includes searching for keywords and phrases, peoples' names, the identifying numbers of key work items, knowledge-base articles, and builds. Moreover, each repository has its own unique search interface so this process is quite tedious.
Therefore, what is needed is a more efficient manner for sorting and investigating a repository of source material from a software development project.