1. Field of the Invention.
This invention relates to computer system resource management systems and methods in general, and more specifically, to methods for indexing and locating assets, such as application programs which may be written in a variety of different programming languages, in a distributed or network environment.
2. Description of the Related Art.
One of the most common uses of computers is to create, store, and index data for later retrieval. As a result of the burgeoning growth of computer usage, the number of data files available for searching has grown exponentially, leading to an information overload that can overwhelm a data searcher.
To help manage the access to these massive numbers of files, also known as xe2x80x9cassetsxe2x80x9d, a process called xe2x80x9cdata miningxe2x80x9d has evolved. Data mining is defined in Newton""s Telecom Dictionary (15th Edition, Miller Freeman Publishing, New York, N.Y.) as xe2x80x9c[U] sing sophisticated data search capabilities that use statistical algorithms to discover patterns and correlations in data.xe2x80x9d In essence, computers are used to xe2x80x9ccrawlxe2x80x9d through masses of data files, analyze the information contained in the files according to criteria input by the user, and output results to the user which the user can use to study the information further.
To support the explosive growth of computer usage, software development has become a key part of any company engaged in high-technology business. Large companies may have many software development groups located at numerous locations throughout the world, with each group employing hundreds or thousands of employees.
As used herein, complete programs (e.g., Microsoft Word(trademark)) developed by the programmers are referred to as xe2x80x9csoftware assetsxe2x80x9d and the various subroutines used to produce the software asset (e.g., C++ subroutines and programs used to create Microsoft Word(trademark)) are referred to as xe2x80x9ccode assets.xe2x80x9d These assets may number in the thousands or more for a single company and vary substantially in complexity, function, and size. For example, an asset may be a single program comprising hundreds of thousands of lines of computer code and designed to perform a multitude of tasks; at the other end of the spectrum, an asset may be a single subroutine comprising three lines of code.
With large numbers of employees focusing their work on the development of these assets, management becomes a critical task. With multiple groups within a company at different locations developing software for a variety of tasks, it is inevitable that duplication of effort will occur.
To avoid such duplication, it is desirable for all of the members of design groups, as well as all of the design groups within a company, to be able to share with each other the assets that they develop, and systems have been developed to assist in the management of such assets. In the software development field, the management, indexing, and retrieval of assets introduces an additional level of complexity not necessarily found in other asset management schemes. In particular, within a single group, assets may be developed in several different programming languages (e.g., Java, C/C++, COBOL, HTML, and/or XML) at the same time. Searching for code assets increases the complexity and difficulty of the search, since programmers typically want to search for language-specific constructs/semantics, such as inheritance relation, in object-oriented languages which cannot be captured using standard free-text searches. This makes it difficult for the users of the system to thoroughly search all of the assets.
Accordingly, it would be desirable to have an asset location system which offers the ability for free-text xe2x80x9csearch enginexe2x80x9d style queries, attribute-specific queries, or a mixture of free-text queries and attribute-specific queries.
In our copending application Ser. No. 09/473,554 of common assignee herewith, and hereby incorporated by reference, there is disclosed a method and system for locating assets that provides a capability for the gathering of information about assets contained in data repositories. The technique is adapted to gathering information from either a single data repository or a plurality of data repositories, possibly in disparate locations of an enterprise. The captured information is then consolidated into a single database for access by multiple users. While this technique represents an improvement over conventional techniques of asset retrieval, nevertheless this technique does not provide automatic categorization to facilitate search and navigation by the users. Manual categorization has several drawbacks. When done by repository organizers, it requires the presence of an expert in each resource domain at all times. This is a very expensive solution and because it is not scaleable as in modern systems, the expert can never keep up with new resources which are added/updated to the repository every day. On the other hand, categorization which is done by users is also less than satisfactory, because users dislike the necessary overhead of describing the category for every resource, and moreover, they may be unqualified to categorize their resources.
It is therefore a primary object of some aspects of the present invention to improve the efficiency of asset location in computer system resource management systems.
It is another object of some aspects of the present invention to categorize code resources automatically using predefined taxonomy in computer system resource management systems.
It is a further object of some aspects of the present invention to enable users of code repositories to navigate through the repository of code resources in a computer system resource management system according to the category taxonomy without having to actually compose any queries.
It is yet another object of some aspects of the present invention to provide an improved tool for presenting categorization results in a computer system resource management system based on the category information and on the language semantics
These and other objects of the present invention are attained by a tool that is capable of categorizing code resources automatically into predefined taxonomy trees, that is into a set of predefined categories) . This tool enables the users of code repositories to navigate through the repository according to the category taxonomy without having to actually compose any queries. Moreover the category information can be used as part of the query criteria by the users.
In addition this invention describes a tool for presenting the categorization results in a novel method, based on the category information and on the language semantics.
The invention provides a computer-implemented method for indexing and locating assets stored on a storage device, which is performed by defining asset-specific categories for classification of asset-specific information, identifying stored assets, extracting the set-specific information from the stored assets, classifying the extracted information in the set-specific categories according to a predefined set of rules, and storing the classified textual and semantic information for retrieval.
According to a further aspect of the invention, extracting the set-specific information is performed with a language specific parser.
According to yet another aspect of the invention, the predefined set of rules includes a plurality of predefined sets of rules, wherein each of the predefined sets of rules is applied to a different language specific group.
According to still another aspect of the invention, a syntax of rules in the predefined set of rules is xe2x80x9c less than condition greater than xe2x86x92 less than category greater than @ less than weight-factor greater than xe2x80x9d.
According to an additional aspect of the invention, the storage device includes a plurality of storage devices linked in a communications network.
According to an aspect of the invention, the method includes automatically updating the steps of identifying, extracting, and classifying when a new resource is stored in the storage device.
The invention provides a computer-implemented method for locating assets stored on a storage device, comprising the steps of identifying stored assets, extracting asset-specific information from the stored assets, classifying the extracted information according to a predefined set of rules, storing the classified textual and semantic information for retrieval, and displaying the stored information in a tree view. The tree view has a first hierarchy of the classified textual and semantic information and a second hierarchy of the classified textual and semantic information, wherein nodes that are represented in the tree view comprise nodes of the first hierarchy and nodes of the second hierarchy.
According to a further aspect of the invention, displaying is achieved by selecting a displayed element in a first display area to define a selected element, and displaying all categories of the stored information in which the selected element is classified.
According to another aspect of the invention, the selected element is classified according to a category relevancy score.
According to an additional aspect of the invention, the first hierarchy includes categories of a computer programming language that were identified in the step of classifying, and the second hierarchy includes a hierarchy of instances of the categories.
According to an aspect of the invention, the second hierarchy includes a class package.
According to still another aspect of the invention, the second hierarchy includes a name space.
According to an aspect of the invention, there is a step of preclassifying information that was obtained in the step of extracting, according to a plurality of language specific groups.
According to another aspect of the invention, the predefined set of rules includes a plurality of sets of rules, wherein each set of rules is applied to a different language specific group.
The invention provides a computer software product, comprising a computer-readable medium in which computer program instructions are stored. The instructions, when read by a computer, cause the computer to perform the steps of identifying stored assets on a storage device associated with the computer, extracting asset-specific information from the stored assets, classifying the extracted information according to a predefined set of rules, and storing the classified information for retrieval, and, on a monitor connected to the computer, in a first display area of the monitor displaying the stored information in a tree view which has a first hierarchy of the classified information and a second hierarchy of the classified information, wherein nodes of the first hierarchy are integrated with nodes of the second hierarchy.
According to another aspect of the invention, the step of displaying also includes selecting a displayed element in the first display area, and in a second display area of the monitor, displaying all categories of the stored information in which a selected element is classified.
According to a further aspect of the invention, the selected element is classified according to a category relevancy score.
The invention provides a computer system, comprising a storage device for storage of assets therein, a display monitor, a memory for storage of program instructions, and an execution unit that accesses the program instructions in the memory for execution thereof, wherein the program instructions cause the computer to perform the steps of identifying stored assets on the storage device, extracting asset-specific information from the stored assets, classifying the extracted information according to a predefined set of rules, storing the classified textual and semantic information for retrieval, and displaying in a first display area of the display monitor the stored information in a tree view has a first hierarchy of the classified textual and semantic information and a second hierarchy of the classified textual and semantic information, wherein nodes of the first hierarchy are integrated with nodes of the second hierarchy.