The present invention relates generally to processing information in a data repository, and more specifically to executing a batch process on a database or other repository of information based on an analysis of the information in the repository.
A large repository of information can be acted upon or processed in different ways. For example, batch processes can be executed on or using the information of the repository to generate reports or perform other processes. However, as the size of the repository increases, i.e., the amount of information acted upon by the batch process increases, the time and overhead required to perform the process increases. For a very large repository, the time required for the process may be many hours. Furthermore, many such processes require selection or designation of one or more parameters used by the process. The parameters can designate, for example, ranges or types of information to be processed, processes to be performed, etc. If the wrong process is executed or executed with the wrong parameters, the time and overhead required to perform the process may result in a report or other result that is not what is desired or expected and may not be useful to the party executing the process. Thus the time and overhead required to perform the process are essentially wasted.
Currently, to avoid such wasted efforts, a party executing the process can rely on expert knowledge of the information in the repository. For example, an administrator or operator executing the process can have a detailed level of knowledge of the information and/or the population, processes, etc. that the information represents. Thus, the administrator or operator can designate parameters based on this knowledge to obtain the results desired. However, such a level of knowledge is not always available and is subject to being lost if the expert employee leaves. Another approach is to manually execute a series of queries against the repository to gain insight into the information therein prior to executing the process. However, such an approach requires that the queries be correctly targeted in order to provide results that are useful in selecting specific parameters. Thus, this approach still relies on some level of expert knowledge of the information and/or processes. Furthermore, since such an approach relies on manually specifying and executing the queries, it is not useful or efficient for performing a “what if,” type of analysis. Additionally, this approach does not provide an overall view or summary of the contents of the repository. Hence, there is a need for improved methods and systems for executing a batch process on a database or other repository of information.