This invention relates to automated software systems for processing collections of computer files in arbitrary ways, thereby improving the productivity of software developers, web media developers, and other humans and computer systems that work with collections of computer files.
The general problem addressed by this invention is the low productivity of human knowledge workers who use labor-intensive manual processes to work with collections of computer files. One promising solution strategy for this software productivity problem is to build automated systems to replace manual human effort.
Unfortunately, replacing arbitrary manual processes performed on arbitrary computer files with automated systems is a difficult thing to do. Many challenging subproblems must be solved before competent automated systems can be constructed. As a consequence, the general software productivity problem has not been solved yet, despite large industry investments of time and money over several decades.
The present invention provides one piece of the overall functionality required to implement automated systems for processing collections of computer files. In particular, the current invention has a practical application in the technological arts because it provides application programs with a convenient, precise, scalable, and fully automated means for recognizing particular collections of files for automated processing.
The Collection Recognition problem is one important problem that must be solved to enable the construction of automated processing systems. It is the problem of how to automatically recognize particular collections of files for automated processing.
Some interesting characteristics of the collection recognition problem that make it difficult to solve include at least these: collections can have arbitrary data type; collections can have arbitrary size and content; collections can have arbitrary internal structure; collections can require arbitrary processing; collections can be arbitrarily located within a filesystem, database, or network search space; only a few interesting collections might be selected from a large pool of collections; selection processes can use internal content or external filesystem attributes; and arbitrary numbers of collections may be involved.
General Shortcomings of the Prior Art
A professional prior art search for the present invention was performed, but produced no meaningful, relevant works of prior art. Therefore the following discussion is general in nature, and highlights the significant conceptual differences between file-oriented mechanisms in the prior art and the novel collection-oriented mechanisms represented by the present invention.
Prior art approaches lack support for collections. This is the largest limitation of all because it prevents the use of high-level collection abstractions that can significantly aid productivity.
Prior art approaches lack user-defined data types for collections of files. This is a significant limitation because user-defined data types are a primary mechanism for carrying relevant semantic information about collections of files.
Prior art approaches lack shared data types for collections of files. This is a significant limitation because sharable type definitions are a primary mechanism for propagation and reuse of important collection type information.
Prior art approaches lack user-defined per-collection instance data. This is a significant limitation because per-instance data is the primary mechanism for augmenting or overriding general type definition information shared among all collections of a particular type.
Prior art approaches lack the ability to use collection type definition information and collection instance data for match criteria in collection recognition searches. This is a significant limitation because collection type definition and collection instance data are both rich sources of useful recognition matching information.
As can be seen from the above description, prior art approaches have several important disadvantages. Notably, prior art approaches do not support collections, do not support user-defined collection instance information, and do not support user-defined collection data types. These are the three most important limitations of all.
In contrast, the present collection recognizer invention has none of these limitations, as the following disclosure will show.
A collection recognizer dynamically detects and selects collections from within a search space, and makes the resulting collection recognition information available to software programs, thereby enabling the construction of fully automated software systems for processing collections of arbitrary computer files.
In operation, a collection recognizer is used by an application program to recognize interesting collections of files for processing. A collection recognizer first detects a set of interesting collection signatures from within a search space using signature detection criteria, thereby forming a first pool of detected collections. From the first pool of detected collections, a second pool of selected collections is created, using various selection criteria. Selection criteria can include search space information, collection instance information, collection content information, and collection type definition information. Ultimately, a collection recognizer returns information about detected and selected collections to a calling program for subsequent processing.
Collection recognizers solve the collection recognition problem by providing software programs with a generalized, precise, scalable, customizable, and extensible means for recognizing collections within a filesystem search space. In particular, collection recognizers return information-rich collection data structures back to calling software programs. Collection recognizers thus enable automated collection processing systems to recognize collections of arbitrary computer files in more precise, more automated, more scalable, and more knowledgeable ways than were previously possible.
The present collection recognizer invention solves all of the general prior art limitations described previously. Specifically, collection recognizers support collections of files, support user-defined collection types, support shared collection types, support user-defined per-collection instance data, and support use of collection type and instance data in recognition searches.
The present collection recognizer invention also has the following additional objects and advantages.
One object of the present invention is to provide a generalized, fully automated collection recognizer means for software programs, thereby enabling the construction of generalized, large-scale, automated collection processing systems.
Another object is to provide sufficient flexibility, extensibility, and capacity to strongly resist scale-up failure, thereby enabling automated collection recognizers and collection processing systems to scale up smoothly, with reduced risk of scale-up failure.
Another object is to provide a collection recognition model that is independent of search space type, thereby enabling collection recognition searches to be conducted using various search spaces including filesystems, databases, and distributed networks.
Another object is to produce information-rich data structures from the recognition process, containing both collection information and recognition process information, thereby saving application programs the effort of obtaining collection and process information themselves.
Other features and advantages of the present Collection Recognizer invention will become apparent upon further reading of the drawings and disclosure that follow.