The general problem addressed by this invention is the low productivity of human knowledge workers who use labor-intensive manual processes to work with collections of computer files. One promising solution strategy for this software productivity problem is to build automated systems to replace manual human effort.
Unfortunately, replacing arbitrary manual processes performed on arbitrary computer files with automated systems is a difficult thing to do. Many challenging subproblems must be solved before competent automated systems can be constructed. As a consequence, the general software productivity problem has not been solved yet, despite large industry investments of time and money over several decades.
The present invention provides one piece of the overall functionality required to implement automated systems for processing collections of computer files. In particular, the current invention has a practical application in the technological arts because it provides both humans and software programs with useful information about the contents of collections that require processing.
Problems to be Solved
The Collection Content Classification Problem is one important problem to solve to enable the construction of automated collection processing systems. It is the problem of how to determine collection content members, content types, content processing actions, and content processing interdependencies. Solving the Collection Content Classification problem is important because a solution would enable application programs to process collections of computer files in more powerful, more automated ways than were previously possible.
Some interesting aspects of the general Collection Content Classification problem are these: arbitrary collection types may be involved, containing arbitrary internal structures and numbers of internal products, and arbitrary product types. Arbitrary numbers of files and file types may be involved, requiring arbitrary content processing actions, platform dependent processing actions, and various administrative preferences for all of the above. The general Collection Classification Problem is not a simple problem.
The Collection Multiple Product Problem is another important problem. It is the problem of how to represent multiple collection output products within one collection. This problem is important because it is both intuitive and practical to create several output products from one set of related collection content files. Without a solution to the multiple product problem, a separate collection instance would be required for each desired product, thereby increasing software complexity and maintenance costs.
Some interesting aspects of the multiple product problem are these: an arbitrary number of collection products may be involved, each product may have an arbitrary product type and product content, and an arbitrary set of required product-level, platform-dependent processing actions.
The Collection Content Membership Problem is another important problem. It is the problem of how to dynamically determine what directories and files are part of a collection. This problem is important because manually enumerating collection content files is a tedious, error prone, and non-scalable method.
Some interesting aspects of the Collection Content Membership problem are these: collection content files can belong to separate products on different platforms, content files can be shared among multiple products and platforms, content files can be ignored for particular products and platforms, and content files can be stored outside the host collection subtree for various products and platforms.
The Collection Special Fileset Problem is another important problem. It is the problem of how to identify special content files and then process them in special ways. For example, the normal way to compile Fortran source code files is with code optimization enabled, but some Fortran source files are so big that they cannot be optimized. Therefore a mechanism is needed to process a few large Fortran files out of many in a special way, with code optimization turned off. Without a solution to the Collection Special Fileset problem, automated collection processing systems cannot process special content file cases. This is a significant limitation in real-world industrial software environments, which invariably have special processing situations.
Some interesting aspects of the Collection Special Fileset problem are these: many special files may be involved, although typically only a few files out of many are involved, multiple sets of special files may be involved, and special processes for the special files may range from simple to complex, or similar to very different from the norm.
The Collection Content Type Assignment problem is another important problem. It is the problem of how to dynamically determine a content type for each collection content file. A solution to this problem is important because content types are the primary means for determining automated processing actions for processing content files. Without a proper content type assignment, automated systems cannot easily make decisions on how to process content files.
Some interesting aspects of the Collection Content Type Assignment problem are these: large numbers of content files may be involved, arbitrary user-defined content types must be supported, type assignment methods must not fail on missing filename suffixes, and type assignment mechanisms must use internal parseable type-marker strings when necessary.
The Collection Action Assignment problem is another important problem. This is the problem of how to associate collection content files with appropriate automated file processing actions.
Some interesting aspects of the Collection Action Assignment problem are these: arbitrary processing actions for files, filesets, and products must be supported, processing actions must be chosen in accordance with collection, product, and content types, and processing actions must be sharable, customizable, extensible, and fully user-definable.
The Collection Content Dependency problem is another important problem. This is the problem of how to determine processing dependencies for collection content members. This problem is important because collection content files must be processed in accordance with interdependencies among content files in order to produce valid collection product results.
Some interesting aspects of the Collection Content Dependency problem are these: dependencies exist among files, such as source files depending on include files, dependencies exist among products, such as program products depending on the existence of library products for linking purposes, and dependencies must sometimes be calculated for unknown languages, so an extensible mechanism for dependency calculation is required.
General Shortcomings of the Prior Art
A professional prior art search for the present invention was performed, but produced no meaningful, relevant works of prior art. Therefore the following discussion is general in nature, and highlights the significant conceptual differences between file-oriented mechanisms in the prior art and the novel collection-oriented mechanisms represented by the present invention.
Prior art approaches lack support for collections. This is the largest limitation of all because it prevents the use of high-level collection abstractions that can significantly improve productivity.
Prior art approaches lack collection content listing means to dynamically determine collection content members of each collection product, thereby requiring the use of manually constructed content lists, and increasing software maintenance costs.
Prior art approaches lack collection content typing means to dynamically determine data types for collection content members, thereby preventing flexible, scalable, and automated processing of content according to data type.
Prior art approaches lack extensible collection content dependency means to dynamically determine processing dependencies for collection content members, thereby preventing the easy extension of automated dependency calculations to new programming languages.
Prior art approaches lack product build order means for dynamically determining relative product build order of multiple products within a single collection, thereby preventing the proper construction of multiple collection products.
As can be seen from the above description, prior art mechanisms in general have several important disadvantages. Notably, general prior art mechanisms do not support collections, symbolic content types, or action assignment based on content types. These are the most important limitations of all.
In contrast, the present collection command applicator invention has none of these limitations, as the following disclosure will show.
Specific Shortcomings in Prior Art
Several examples of prior art approaches that classify sets of related computer files are discussed below. The examples fall into two main categories: makefile generator programs and integrated development environment (IDE) programs. Both types of programs classify a list of source files so that the files can be processed efficiently in an automated manner.
Prior Art Makefile Generators
Makefile generator programs generate makefiles for humans who are building software programs. Typically, makefiles contain computer instructions for compiling source code files and linking compiled object files to produce executable files or libraries of object files. In addition, programmers typically include a variety of other useful command sequences in makefiles to increase productivity.
Examples of popular freeware makefile generators include automake, imake, and mkmf (make makefile). Although each program is useful, each program has several important classification shortcomings.
Automake has no dynamic content discovery mechanism; instead it requires programmers to manually list all files that require processing. Neither does it have a mechanism for sharing classification information, so multiple automake files cannot easily share user-provided information. Finally, it uses an input file that must be manually constructed, and so its classification operations are not fully automated.
Imake has no dynamic content discovery mechanism; instead it requires programmers to manually list all files that require processing. It also uses an input file that must be manually constructed, so its classification operations are not fully automated.
Mkmf does have a dynamic content discovery mechanism that dynamically includes all source files in the current directory in the output makefile. However, only the current directory is used to find source files; no other directories are supported. Significantly, all source files in the directory are included, whether they should be included or not. This forces programmers to unnaturally restrict the directory contents to only those file that should be discovered by mkmf. Finally, all files are used to build one product only; files cannot be grouped into multiple products.
Thus these makefile generators, which are characteristic of the prior art, have significant content classification limitations.
Prior Art IDEs
Integrated development environments integrate many software development tools such as editors, compilers, linkers, debuggers, and online documentation into one application program environment. Many IDE programs contain an internal makefile generator to generate makefiles to control the software build process.
However, integrated development environments do not typically have dynamic content discovery mechanisms. Instead, programmers are required to manually identify interesting files for inclusion in the IDE project file. Therefore IDE classification operations are not fully automated. In addition, IDE programs do not typically provide user-customizable means for assigning particular data types to whole projects, or to files within those projects. Thus IDE programs lack user-definable and sharable classification type definition information.
As can be seen from the above descriptions, prior art approaches have several important disadvantages. In contrast, the present collection content classifier invention has none of these limitations, as the following disclosure will show.