1. Technical Field
The present invention relates generally to an improved method for organizing and presenting complex, detailed information stored in electronic form. The invention may find particular use in organizations that have a need to manage large repositories of documents containing related information. Typically, such organizations require changes in one document to be reflected in other related documents.
2. Background Information
Many complex projects—for example, software development, drug development and clinical trials, product development and testing etc.—involve the management of large heterogeneous document repositories. These repositories may contain thousands of documents of various types—text, spreadsheets, presentations, diagrams, programming code, ad-hoc databases etc—that have been created during different phases of the project lifecycle. Although the documents may be related to each other, the fact that they are of different formats and created during different phases of the project lifecycle makes it difficult to uncover the inter-relationships among the documents.
For a software project, a document repository may contain documents created throughout the project lifecycle. A typical software project lifecycle may be divided into at least four stages. First, project requirements are defined. The requirements relate to project goals, capabilities and limitations of the software system which the software project is to implement. Second, designs are built around the requirements. Design specifications form a plan for actually implementing a system which achieves the requirements previously defined. Next, the software code is written to reflect the design. Finally, testing is performed to verify the execution of the code and to determine if the requirements and design specifications are incorporated into the final application.
Therefore, the documents in the software project repository may detail project requirements, design criteria, programming code, test data, defect reports, code review reports, and the like. Furthermore, these documents are typically of varying types, such as the document types described above. Although many of these documents are inter-related, the size and heterogeneity of a typical repository make it difficult to find these inter-relationships. Technical problems also arise when attempting to find these inter-relationships across various types of files. In other words, typical document repositories do not allow for a high level of traceability.
Traceability is important to software project managers for two reasons. First, traceability allows a development team to quickly and easily perform impact analysis. Impact analysis is the process of determining which additional documents may be affected by a change in a given document. Second, traceability allows the project team to perform coverage analysis. Coverage analysis is the process of verifying that the design specification implements the project requirements and that the code, in turn, implements the design specification.
A lack of traceability leads to two types of business problems. One problem is poor software quality. This problem may occur because developers cannot easily determine if the software fulfills all requirements and has been tested against all test conditions or because the repository contains incompatible versions of requirements, design, code etc as the project evolves. A second problem is increased time and effort as the developers must manually determine the inter-relations among documents.
Maintaining a consistent software project repository is a critical and well-researched problem in software engineering. In the past, systems have been created that allow developers in a large software project to manually create the inter-relationships among the various elements in the project repository. These commercial software development systems (Integrated Development Environments or IDEs) provide facilities for manually linking related items in the repository via explicit references. However, such an approach is not feasible in many cases for the following reasons: First, it is very time consuming. A typical repository may have thousands of documents, each covering multiple topics. Manually creating each link can cost a considerable number of man-hours. Second, a large software project may involve multiple teams, each focusing on different aspects of the project. For example, one team may determine the project requirements, another team may create the design specifications, a third team may build the code, a fourth team may develop test scripts and a fifth team may perform testing and quality assurance. These teams may be working in different locations, and may be affiliated with different companies. When creating a link in the code, the code builder may not realize the complete extent of his or her involvement in relation to the other teams. Thus, relevant links may never be created. Third, manually creating references causes the links to be brittle. Although a link may be accurate when created, later changes in the requirements or design specifications may create a need for new links or render old links ‘dead.’ Fourth, many large software projects evolve over a period of time, with new functions built over much older “legacy” components and technologies. In such cases a manual process is infeasible as there are few or no individuals who have a working knowledge of the older legacy components.
A second approach to maintaining a consistent software project repository has been to enforce a rigid development process and a rigid repository structure. While such an approach is applicable for a single team building the software system from start to finish under a single development methodology, it is impractical when the above team dynamics are present or when legacy systems are linked to current development projects. The present invention provides a robust technique for automatically discovering inter-relationships among the various elements in a large software repository that may contain thousands of documents of different formats created at various stages of the project lifecycle.