The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for discovering application information for structured databases.
In large enterprises, there are a large number of structured and semi-structured data collections residing in multiple information systems, such as wide-table collections (i.e. HBase™, Cassendra™), relational databases (i.e., DB2®, Oracle®), or the like. As well recognized, massive data collections provide such enterprises great value. Therefore, more enterprises are taking initiatives to collect and integrate such data. However, once an enterprise has collected massive amounts of structure or semi-structured data, enterprises often have issues identifying uses for the collected data. In other words, identifying uses for such massive data collections is critical for example, what tasks to utilize data for and from what perspective the data is valuable to the enterprise.
Nowadays, more enterprise systems employ a service-oriented architecture (SOA) and provide an increasing proportion of resources as (cloud) services. Services use description metadata to describe not only the characteristics of these services, but also the data that drives them. In other words, the released services and their associated descriptions become a wealthy source of usage information for these provided resources. Traditionally, usage information of such data has not been well documented. More often, the use of some data collections reside in the mind of the employee that caused the data to be collected in the first place. Thus, when the employee is no longer employed by the enterprise, the intended use of the data collection is lost. When necessary, data is manually screened and usage information is manually summarized. Manually screening and annotating of the data is very labor and time consuming, and often leads to inaccuracy because the screening and annotations are from one employees perspective, who often times has a deficient knowledge about the data.