1. Field of the Invention
The present invention relates to the field of source code auditing and more particularly to software compliance management.
2. Description of the Related Art
A software code audit is a comprehensive analysis of source code in a programming project with the intent of discovering errors, security vulnerabilities or violations of programming conventions. The source code audit is an integral part of the defensive programming paradigm, which attempts to reduce errors before the software is released. Source code auditors generally perform a line-by-line inspection of programming source code to identify errors, security vulnerabilities and programming convention violations. Once testing and code inspection phases complete, source code auditors often generate a visual report detailing code deficiencies revealed by the analysis.
In a simple development environment for a conventional, stand-alone application, source code auditing primarily concerns itself with the integrity of the code in terms of operability and the avoidance of malware elements. In a complex development environment like a geographically dispersed, multi-developer environment in the mode of open source development, however, code integrity includes not just bug detection but also the detection of non-malicious, albeit unauthorized source code placed into source either intentionally or inadvertently. Further, in a world of consolidation in the technology industry, a merger or acquisition generally involves the commandeering and adoption of source code developed outside the control of the acquiring party. The impact of the presence of such unauthorized source code can result in substantial liability for the publisher of the source and can inhibit the ability for the publisher to commercially distribute the source as part of a larger application.
Specifically, in an era of open source development, it is not uncommon for developers to build upon the efforts of one another. In fact, the notion of borrowing and extending the source of others forms the foundation of the open source movement. The open source license provides the enabler for the open source movement by accounting for the copyrights arising from the source code contributions of individual developers. Yet, oftentimes, an open source license can require a model of software distribution that is largely incompatible with the business model for commercial software publishers. As a result, incorporating open source code unwittingly in a commercial software application can implicitly invoke the terms of an open source license despite the incompatibility of the open source license with the for-profit business model of software publishing.
To account for the risk of open source segments appearing in commercially distributed software, software auditing tools explicitly seek out source code segments of known open source applications. In the most general case, these types of software auditing tools rely upon a database of known text to pattern match against source snippets in code under analysis. Other less automated techniques provide a search interface for text searching and pattern matching to be provided dynamically by an end user. Advanced forms of code analysis utilize a knowledgebase of “code prints” to identify not only source code snippets produced by third parties, but also the licensing obligations attached to those identified code snippets.
One popular way to identify third party source code snippets is to scan source code, using pattern matching and regular expressions, for the copyright statements of others. However, copyright statements vary in nature from “Copyright (c) 2007 All Rights Reserved” to merely “© 2007” and the like. Accordingly, pattern matching using only regular expressions can be challenging and hardly perfect. Additionally, in many cases, merely recognizing the presence of a third party source code snippet is not enough. Rather, the identity of the rights holder can be just as important in that in many cases, an existing licensing agreement with the rights holder may permit the presence of the third party source code snippet to be identified. Finally, scanning every file for a development project can result in an unwieldy listing of “hits”, even within files ultimately not resulting in a distributable binary. Pattern matching, usually implemented with regular expressions, results in an inaccurate and too-long list of possible copyright statements and rights holders, and in a large number of false positives.