The general problem addressed by this invention is the low productivity of human knowledge workers who use labor-intensive manual processes to work with collections of computer files. One promising solution strategy for this software productivity problem is to build automated systems to replace manual human effort.
Unfortunately, replacing arbitrary manual processes performed on arbitrary computer files with automated systems is a difficult thing to do. Many challenging subproblems must be solved before competent automated systems can be constructed. As a consequence, the general software productivity problem has not been solved yet, despite large industry investments of time and money over several decades.
The present invention provides one piece of the overall functionality required to implement automated systems for processing collections of computer files. In particular, the current invention has a practical application in the technological arts because it provides automated collection processing systems with a means for obtaining useful, context-sensitive knowledge about variant computational processes for processing collections.
Introduction to Process Knowledge
This discussion starts at the level of automated collection processing systems, to establish a context for the present invention. Then the discussion is narrowed down to the present invention, a Collection Knowledge System.
The main goal of automated collection processing systems is to process collections of computer files, by automatically generating and executing arbitrary sequences of computer commands that are applied to the collections.
One critical part of automated collection processing systems is automatically calculating a variant computational process to execute on the current collection. This calculation is quite difficult to carry out in practice, for many significant reasons. One reason is that the computation can be arbitrarily complex. Another reason is that many files can be involved in the computation. Another reason is that many different application programs can be involved in the computation. Another reason is that platform dependent processes can be involved in the computation. Another reason is that various site preferences and policies can be involved in the computation. And so on. The list of complicating factors is long. Generating and executing arbitrary computational processes is a complex endeavor even for humans, let alone for automated collection processing systems.
For communication convenience, this discussion is now narrowed down to focus on the general domain of software development, and in particular on the field of automated software build systems. This is a useful thing to do for several reasons. First, it will ground the discussion in a practical application in the technical arts. Second, it will bring to mind examples of appropriate complexity to readers who are skilled in the art. Third, it will provide several concrete problems that are solved by the present invention.
Even though the discussion is narrowed here, readers should keep in mind that automated collection processing systems have a much wider application than only to the particular field of software build systems. The goal of automated collection processing systems is to automatically generate and execute arbitrary computational processes on arbitrary collections of computer files.
Software build processes are good examples for the present invention because they effectively illustrate many of the problem factors described above. That is, software build processes they can be large, complex, customized, platform dependent, and can involve many files and application programs. For example, large multi-platform system builds can easily involve tens of computing platforms, hundreds of different software programs on each platform, and tens of thousands of data files that participate in the build. The automatic calculation and execution of such large software build problems is a challenging task, regardless of whether manual or automated means are used.
One of the most difficult aspects of calculating and executing software builds is accommodating variance in the calculated processes. For example, typical industrial software environments often contain large amounts of process variance in tools, processes, policies, data, and in almost everything else involved in computational processes.
Process variance is difficult to handle even for human programmers, because variance usually requires simultaneous, peaceful, co-existence and co-execution of a plurality of variant instances of complex processes. Peaceful co-existence and co-execution are not easy to achieve within industrial software environments. For example, two variant processes might both use a large customer database that is impractical to duplicate. Or multiple working programs in an existing working process might be incompatible with a required new variant program. And so on. In typical cases, many interacting process issues must be resolved before variant processes of significant size can peacefully coexist in industrial software environments.
As a simple model of variant process complexity, consider a single long strand of colored beads on a string. Beads represent programs, data files, and process steps. Beads can have variant colors, shapes, and versions. Individual beads represent steps in a computational process, individual programs, particular input or output data files, or particular sets of control arguments to particular program beads. Finally, consider that software build problems of medium complexity are represented by several hundreds or thousands of beads on the string.
This bead model can now be used to illustrate how process variance affects complexity. To begin with, it is reasonable to say that most complex industrial software processes must be varied in some way to meet the needs of various computational situations. In the bead model, this is equivalent to saying that most beads on the string will need to be varied at some time, by color, by shape or by version, in order to create a new, particular variant process to meet a new, particular variant software development situation.
As one example, it is often the case that some original data files will be incompatible with a proposed new computing platform bead or a proposed new program bead, requiring that multiple data beads be changed whenever certain platform or program beads are changed. This example illustrates coupling among particular changes within a particular computational process. This example also illustrates the point that it is not always possible to make only one change in a process; in many situations, multiple bead changes must be coordinated in order to create a desired variant computational process.
As a second example of bead model complexity, large industrial software environments can easily contain tens of long product strings and new product version strings, each containing many hundreds or thousands of beads. Further, tens or hundreds of programmers can be continuously improving the various product strings by adding new feature beads, fixing software bug beads, modifying existing feature beads in some way, or cloning, splitting, or merging whole strings of beads. Since each string change must be tested, it follows that many variant combinations of data beads, feature beads, bug fix beads, and process beads must peacefully co-exist together in host industrial software environments for arbitrary periods of time.
Thus it can be seen that automated collection processing systems are not faced with only simple problems involving single complex computational processes. Instead, automated systems face a far more difficult, more general problem that involves whole families of related, complex, coupled, customized, and platform-dependent computational processes.
For each individual computational situation, a competent automated system must calculate, create, and execute a precisely correct variant computational process. In order to do that, automated systems require access to a large amount of variant process knowledge. One mechanism for providing the required knowledge is a Collection Knowledge System, the subject of the present invention.
Problems to be Solved
This section lists several important problems that are faced by automated collection processing systems, and that are solved by the present Collection Knowledge System invention.
The Knowledge Organization Problem is one important problem that must be solved to enable the construction of automated collection processing systems. It is the problem of how to organize knowledge for variant processes, in one place, with one conceptual model, for use by multiple programs in variant processing situations.
Some interesting aspects of the Knowledge Organization Problem are these: an arbitrary number of programs can be involved; an arbitrary amount of knowledge for each program can be involved; knowledge used by programs can have arbitrary structure determined by the program; and knowledge can exist in various binary or textual forms.
The Customized Knowledge Problem is another important problem to solve. It is the problem of how to customize stored knowledge for use in variant processing situations.
Some interesting aspects of the Customized Knowledge Problem are these: knowledge can be customized for arbitrary programs; arbitrary amounts of knowledge can be customized; knowledge can be customized for sites, departments, projects, teams, and for individual people; knowledge can be customized by purpose (for example, debug versus production processes); and various permutations of customized knowledge may even be required.
The Platform-Dependent Knowledge Problem is another important problem. It is the problem of representing platform dependent knowledge in ways that promote human understanding, reduce knowledge maintenance costs, provide easy automated access to stored knowledge, and enable effective sharing of platform dependent knowledge across multiple platforms within particular application programs.
Some interesting aspects of the Platform Dependent Knowledge Problem include these: many platforms may be involved; platforms can be closely or distantly related; platforms can share a little or a lot of information; new platform knowledge is sometimes added; old platform knowledge is sometimes discarded; knowledge can be shared among many or a only few platforms within an application.
The Coupled-Application Knowledge Problem is another important problem. It is the problem of multiple applications being indirectly coupled to each other by their shared use of the same processing knowledge for the same computing purpose. As a consequence of coupling, knowledge changes made for one program may require knowledge changes to be made in other programs. For example, a single knowledge change to enable software “debug” compilations typically requires changes to both compiler and linker control arguments. The compiler is told to insert debugging symbol tables, and the linker is told not to strip symbol tables out of the linked executable file. Thus two bodies of knowledge for two apparently independent programs are coupled by the purpose of debugging.
Some interesting aspects of the Coupled-Application Knowledge Problem are these: multiple applications may be involved in a coupling relationship; applications may be coupled by data file formats, control arguments, or execution sequences; coupling relationships can vary with the current processing purpose; multiple sets of coupled programs may be involved; and multiple coupled processes involving multiple sets of coupled applications may be involved.
The Shared Knowledge Problem is another important problem. It is the problem of how to share knowledge among multiple programs regardless of variant processing purposes or coupling relationships. This problem is not the same as the Coupled-Application Knowledge Problem, which considers indirect coupling among multiple applications according to computational purpose (e.g. debugging). Instead, the Shared Knowledge Problem considers deliberate sharing of knowledge among applications and multiple platforms to reduce multiple copies of the same knowledge.
Some interesting aspects of the Shared Knowledge Problem are these: shared knowledge may be platform dependent; shared knowledge may be customized by site, project, person, purpose, and so on; multiple applications may share one piece of knowledge; and the set of multiple applications that share a piece of knowledge may change with variant processing purpose.
The Scalable Knowledge Delivery Problem is another important problem. It is the problem of how to deliver arbitrary amounts of complex, customized, shared, and platform-dependent knowledge to arbitrary programs, in ways that are resistant to scale up failure.
Some interesting aspects of the Scalable Knowledge Delivery Problem are these: arbitrary amounts of knowledge can be involved; the format of delivered knowledge can be a text string, a text pair, a list, a text or binary file, a set of files, a directory, or even a tree of files; network filesystem mounting methods such as NFS (Network Filesystem System) are sometimes inappropriate or unreliable; frequently used knowledge should be cached for faster retrieval; and cached knowledge must be flushed when the underlying original knowledge is updated or removed.
The Mobile Knowledge Problem is another important problem. It is the problem of how to encapsulate knowledge within a collection, so that knowledge can be shipped around the network in the form of collections of computer (knowledge) files. Importantly, application programs working within the nature filesystem boundaries of the collection directory subtree should be able to use the knowledge stored within the mobile collection. Mobile collections provide an implementation of the idea of location-sensitive knowledge.
Some interesting aspects of the Mobile Knowledge Problem are these: arbitrary amounts of knowledge may be involved; knowledge for multiple programs may be involved; customized knowledge may be involved; variant knowledge may be involved; location-sensitive knowledge should override static knowledge stored in the system if so desired; and mobile knowledge arriving at a site should be installable at the receiving site.
The Workspace Knowledge Problem is another important problem. It is the problem of how to configure a computer filesystem workspace to contain particular sets of hierarchically-organized collections that each contain multiple knowledge files, such that the knowledge files become available to application programs that work within the directory subtree that defines the workspace subtree. Workspaces provide an implementation of the idea of location-sensitive knowledge.
Some interesting aspects of the Workspace Knowledge Problem are these: arbitrary amounts of knowledge can be involved; workspaces lower in the subtree should share or “inherit” knowledge stored above the workspaces in the subtree; knowledge located lower in the subtree should override knowledge located higher in the subtree; and knowledge should become available to programs only when their current working directory is within the workspace subtree.
The Aggregated Knowledge Problem is another important problem. It is the problem of aggregating various smaller bodies of knowledge into larger bodies of knowledge that are intended to serve a particular focus, purpose, area of endeavor, or problem domain.
Some interesting aspects of the Aggregated Knowledge Problem are these: arbitrary amounts of knowledge can be involved; aggregated collections of knowledge should be named for convenient reference; aggregated knowledge can be associated with particular filesystem locations or subtrees; and aggregated knowledge should be accessible by name, independent of filesystem location.
The Installable Knowledge Problem is another important problem. It is the problem of how to create, install, and maintain smaller, named, encapsulated subsets of system knowledge, thereby reducing the complexity of the overall system knowledge management problem.
Some interesting aspects of the Installable Knowledge Problem are these: arbitrary amounts of knowledge can be involved; previously installed knowledge must be uninstalled before newer installable knowledge can be installed; installable knowledge should not be coupled to previously existing knowledge; programs must dynamically detect and use installable knowledge; and uninstallation must cause the flushing of previously cached versions of the old installable knowledge.
As the foregoing material suggests, knowledge management for supporting variant computational processes is a complex problem. Many important issues must be solved in order to create a competent knowledge management and delivery system.
General Shortcomings of the Prior Art
A professional prior art search for the present invention was performed, but produced no meaningful, relevant works of prior art. Therefore the following discussion is general in nature, and highlights the significant conceptual differences between the program-oriented knowledge storage mechanisms of the prior art, and the novel collection-oriented knowledge management mechanisms represented by the present invention.
Prior art approaches lack support for collections. This is the largest limitation of all because it prevents the use of high-level collection abstractions that can significantly improve productivity.
Prior art approaches lack support for managing many simultaneous and different customizations of program knowledge, thereby making it impossible for one set of knowledge to simultaneously serve the needs of many software programs that participate in many variant computational processes.
Prior art approaches lack support for managing the knowledge of coupled application programs in synchronization, thereby making it difficult for humans to coordinate the actions of chains of coupled programs in variant computational processes, and thereby increasing human programming costs.
Prior art approaches lack support for sharing knowledge among multiple unrelated programs, thereby requiring humans to provide each program with its own copy of shared knowledge, and thereby increasing knowledge maintenance costs.
Prior art approaches lack support for using a single scalable means to deliver operational knowledge to many programs within typical industrial software environments, thereby making it more difficult to centrally manage knowledge, and thereby increasing knowledge maintenance costs.
Prior art approaches lack support for partitioning program knowledge into encapsulated subsets of mobile knowledge that can be easily moved around and utilized within a filesystem or computer network. This discourages the sharing and mobility of knowledge, and discourages the use of mobile, location-sensitive knowledge in particular computing situations.
Prior art approaches lack support for associating knowledge with particular directories in filesystem subtrees, thereby making it impossible to configure hierarchical computer workspaces to contain particular sets of knowledge.
Prior art approaches lack support for aggregating smaller bodies of knowledge into larger, named, bodies of knowledge that can be referenced by name or that can be associated with physical filesystem subtrees. This discourages the association of bodies of aggregated knowledge with particular computational problems or computational workspaces.
Prior art approaches lack support for partitioning program knowledge into encapsulated subsets of installable knowledge that can be individually created, installed, and maintained, thereby increasing the monolithic nature of most stored program knowledge, and thereby increasing knowledge creation and maintenance costs.
As can be seen from the above description, prior art mechanisms in general have several important disadvantages. Notably, they do not provide support for collections, coupled applications, shared knowledge, customized knowledge, installable knowledge, or mobile knowledge.
In contrast, the present Collection Knowledge System has none of these limitations, as the following disclosure will show.
Specific Shortcomings in Prior Art
One main example of prior art knowledge delivery systems is the common technique of storing application program data on a local hard disk, where it can be accessed by an application program.
For example, preference options for spreadsheets and word processors on personal computers are generally stored using this technique. It is fair to say that historically, this particular approach has been the main approach used by the industry to store application program knowledge.
However, as described previously, this approach has many significant limitations with respect to supporting applications that participate in variant computational processes. Indeed, it is fair to say that this simple approach is one of the main causes of difficulty in treating variant processes, for all the reasons listed earlier.
For example, this prior art approach limits the sharing of knowledge among applications. It cannot represent the idea of coupled applications. It cannot associate variant processes with relevant knowledge. It cannot represent different customizations of application knowledge. And so on.
As can be seen from the above description, the main prior art approach used within the software industry has many significant limitations. Most importantly, it is oriented toward storing knowledge for single, isolated applications. It cannot represent knowledge for entire variant processes, and cannot use a combination of customized knowledge from many different application programs to satisfy the knowledge needs of entire variant processes.
In contrast, the present Collection Knowledge System invention has none of these limitations, as the following disclosure will show.