1. Field of the Invention
The invention relates to a method for accessing and automatically correlating data from a plurality of external data sources.
Traditionally computer based solutions have been created to handle real life challenges. At a certain point of time, the developed software begun representing a challenge of itself, as the requirements moved to ever higher levels of abstractions. If initially a software is handling operational data, sooner or later, analytics begin being introduced and the industry seems to have crystallized around how analytics should be done, especially when considering the vast amount of data available and increasing at uncontrollable pace. Current solutions generally go from the premises that it is not reliable or even possible to efficiently query the original data source, so a copy of the data must be created in a format that support the new analytical requirements that will be provided by yet another intermediary system, taking the abstraction to a level that the common developer can build the final solution. This copy and restructuring may be done in different flavors, from being a complete copy in case of ETL (Extract, Transform and Load) processes creating structures for multidimensional querying, to a partial copy in a EAI (Enterprise Application Integration) environment by consuming webservices in different protocols. When going from one to multiple data sources, the legacy approaches are simply repeated to create the so called data warehouse as one size fits all solutions.
2. Description of the Related Art
A method for accessing structured and unstructured data like data from disparate data sources is known from US-A 2005/251501 PHILLIPS ET AL. A business process may access and integrate data from a variety of data sources. This may include identifying a data source and a subset of information of interest within the data source. The data may also be transformed and operated upon to perform the business process. Furthermore, the processed data can be published to desired destinations in desired formats, e.g. to an Excel spreadsheet or other data visualization tool. The data may be extracted from an electronic document and the data output interfaces permit a user to select data destinations and formats, such as e.g. converting data to HTML format. One problem associated with this known method is that the data is extracted from electronic documents, i.e. copied from external data sources, overlooking that as many times the original data source is in fact possible to be queried efficiently, as it is a live system meaning it is being maintained and has received investments for, among other things, having the properly sized persistence layer, sometimes not delivering faster or smarter results due to limitations on the technologies or architecture over this layer.