Computer systems and related technology affect many aspects of society. Indeed, the computer system's ability to process information has transformed the way we live and work. Computer systems now commonly perform a host of tasks (e.g., word processing, scheduling, accounting, etc.) that prior to the advent of the computer system were performed manually. More recently, computer systems have been coupled to one another and to other electronic devices to form both wired and wireless computer networks over which the computer systems and other electronic devices can transfer electronic data. Accordingly, the performance of many computing tasks are distributed across a number of different computer systems and/or a number of different computing components.
Software code is typically written by one or more software developers using some type of integrated development environment (IDE). In many cases, developers are given a set of design instructions, and, using a programming language, draft software code that will implement the functions described in the design specifications. Depending on the nature and scope of the design specifications (or any subsequent modifications thereto), the software program can be both large and complex.
Enterprise software programs, for example, may involve many hundreds or thousands of software files, each file designed to interact with other files within the program and externally with other software programs and/or operating systems. Often, supplemental programs or databases, such as, for example, software repositories, are used to organize, search and maintain the data and metadata which describes the program and its files. Program metadata consists of information such as the structure of program components, behavior of those components, how components and subcomponents may relate to each other, and other characteristics useful for organization and control. File metadata consists of information such as the date the file was last modified, the size of the file, the file's relation to other files within the software program, documentation concerning particular modules or constructs, and other characteristics useful for organization and control.
One approach for organizing a software repository includes storing a software program's objects and their corresponding metadata together using an entity-property-value approach (also called the universal schema approach). Using an entity-property-value approach most data is stored in a table of property ID/value pairs. Thus, a software repository can be organized such that objects are listed alphabetically with the metadata alongside, each portion of metadata corresponding to the appropriate object. For example, a software repository can list a software object and a name and corresponding value for the each property of the software object. Related objects can be, for example, shown as a list of related objects headed by the kind of relationship (e.g., objects related to another object by an automatic generation process).
Using an entity-property-value approach data is stored in a highly uniform way making it relatively easy to build generic repository APIs and browsers. However, due to the typically finer granularity with which software objects are stored (i.e., per property), querying an entity-property-value based software repository can be complex and inefficient. Many objects can include additional relationships to one another (e.g., based on user-preference, code version, access patterns, replication, etc.) causing them to be frequently accessed together. However, these additional relationships are not easily represented using entity-property-value approach. Thus, although these objects are related in additional ways, they typically cannot be easily accessed as a group. Accordingly, sometimes complex queries may be required to access objects individually and then subsequently group them together for performing further operations.
Another approach for organizing a software repository includes storing metadata in XML columns or some other post-relational structure. As opposed to name/value pairs, post-relational structures permit complex data values to be stored in a single table cell. Because some database servers have efficient ways to store XML, an XML column or other post-relational approach can be efficient for hierarchical data (e.g., type definitions of an object oriented software program). Using a post-relational approach, hierarchical data can be flexibly grouped.
However, due to the typically coarser granularity with which software objects are stored (i.e., in a hierarchical tree), querying a post-relational based software repository can be also be complex and inefficient. For example, objects can be related in ways that do not conform well to a hierarchical structure (e.g., based on user-preference, code version, access patterns, replication, etc.) and thus related objects can span different hierarchical trees. Accordingly, queries may be required to access different sub-trees from different post-relational structures and then merge the results together for performing further operations.
Further, most database tools are designed for use with databases based on conventional schemas (as opposed to universal schema or post-relational schema) making their use with entity-property-value and post-relational based software repositories more difficult. As such, not only are these queries typically more complex, a developer must often resort to developing sometimes complex queries without the automated development capabilities included in these database tools.
Accordingly, at the very least, creating software repository queries can consume significant developer resources (that would otherwise be used to develop code). In many cases, creating software repository queries will be beyond the technical expertise of a developer who, while well-trained in code development, may lack expert knowledge in the composition of appropriate database queries.