The present invention is directed to data management and data quality management and control and more particularly to a system and method that provide connections to source data, allow users to view data structures and enable management and manipulation of data contained within possibly heterogeneous data systems. A data system in this application is any system that contains data and provides management of this data. Data management refers more specifically to the management aspect of a data system. In common usage database systems typically refer to relational data systems, such as Oracle and DB2.
Data management and data quality management and control are essential to the operation of modern business and government entities. The establishment of a database, its day-to-day maintenance and data quality testing is a complex task, frequently assigned to specially trained operators or information system groups. The task of maintaining and testing databases is made especially difficult because of the existence of different database formats and the need for business, government and public interest entities running such different formats to cooperate effectively.
One reason for these difficulties is the current practice of developing separate test and management packages for different data systems. For example, if a company is operating an Oracle database, specific test and management software is typically developed with the intention of operating solely on the Oracle database. A similar process is undergone for each database being operated by a business, government or public interest entity. Since separate systems are developed and used for the maintenance of different data systems, operators of the differing systems must separately interface with the various data systems being managed and tested. The lack of a single interface capable of manipulating different data sources and formats in the prior art made this task highly time consuming and very often frustrating for operators.
To better appreciate the problem, it should be pointed that the applicable user environment is one of mixed technologies and mixed human operator skills. Due to the complexity of modern applications and tools it is frequently the case that Information Systems (IS) organizations can successfully train only one person to operate one or two applications or tools. As known, most software applications and tools are limited to only a few features, and thus IS corporate groups and organizations are forced to allocate a number of specialized users in order to support all applications required to run the business. Further aggravation stems from the cost of licensing and implementing each of the applications and tools, in addition to the high labor cost of training the operators.
It will be appreciated that the above problems fundamentally also involve issues of data quality management and control leading ultimately to the adequacy of quality assurance. Consider for example the problem of an organization developing a new inventory system with tracking features. Clearly, a major concern would be the accurate and reliable update of all forms of inventory information with respect to any particular inventory item processed through its life cycle. To ensure such accuracy and reliability, the developed application software is frequently run on test databases to avoid corruption of the operational data. After execution, the test data must be examined for errors. Error types can range from inconsistent data patterns to complete loss of data. Naturally, one has to be able to recognize the error(s) and then attempt to fix the underlying problem. A traditional technique for finding errors is to simply compare the resulting data to the original data. While merely identifying the presence of an error is relatively straightforward, many sophisticated analysis tools may be needed to discover the nature of the errors, and to do so quickly. In order to be commercially successful, such tools must require little concentration and effort on the part of the human operators—otherwise the tool's operation will interfere with the thought process of the operator and may lead to additional delays or errors. The above example is merely illustrative of the type of problems that exist in the data quality control and assurance context. Very few tools exist at present that even come close to satisfying the users' demands in this regard.
For helpful background information the interested reader is directed to the disclosure of the following patents: U.S. Pat. Nos. 4,714,989; 4,714,995; 4,769,772; 4,881,166; 5,046,002; 5,058,000; 5,142,470; 5,161,158; 5,239,577; 5,247,664; 5,257,366; 5,278,978; 5,301,302; 5,345,587; 5,381,534; 5,452,450; 5,561,797; 5,581,749; 5,581,758; 5,630,124, and printed publications:                Arbee L P. Chen, A Localized Approach to Distributed Query Processing, Bell Communications Research, Piscataway, N.J., Mar. 26, 1990, pp. 188-202;        M. Rusinkiewicz et al., “Query Transformation in Heterogeneous Distributed Database Systems,” IEEE, pp. 300-307, 1985.        T. Yu Clement et al., “Query Processing in a Fragmented Relational Distributed System: Mermaid,” IEEE Trans. on Software Engineering, vol. SE-11, No. 8, pp. 795-810, August 1985.        M. Rusinlaewicz et al., “An Approach to Query Processing in Federated Database Systems,” Proc. of the Twentieth Annual Hawaii Intl'Conf. on System Sciences, pp. 430-440, 1987.        S. Kang et al., “Global Query Management in Heterogeneous Distributed Database Systems,” Microprocessing and Microprogramming, vol. 38, pp. 377-384, 1993.        
Attempts have been made to remedy aspects of the above problems by providing data management tools that are capable of operating across heterogeneous data systems. In this application, “heterogeneous data systems” are systems that are capable of operating simultaneously with differing multiple data systems. Examples of such data systems include DB2 produced by International Business Machines (IBM) Corporation, Oracle produced by Oracle Corp., Sybase produced by Sybase Inc., flat files and others. Such heterogeneous database systems, when used together, collectively also represent a heterogeneous, distributed data environment or system. Heterogeneous, distributed data systems are also sometimes called federated data systems or sometimes multi-database systems. At present, there is a need for convenient and reliable data-management system-independent software tools and methods capable of operating in multiple data system environments. Further, there is a need for such tools and methods for the purpose of data quality management and control. However, to the best of applicants' knowledge, none of the known prior art systems provide a robust system for data quality management and control of differing data system. It is the purpose of this invention to address the problems associated with the prior art and meet these and other users' needs.