This invention relates to data storage and retrieval and more particularly to an agent-based networking system for creating a data warehouse and to a method of accessing the same for real time retrieval of application specific data.
The typical approach to data mining is to start with data warehousing, that is, with creating an inventory of data, the data warehouse, and removing ambiguous information. The creation of a data warehouse is concerned with schemes and methods of integrating legacy databases so that they can be accessed in a uniform and manageable framework. This involves data storage, data selection, data cleaning and an infrastructure for updating databases once new knowledge or representations are developed.
The data warehouse is then used to extract knowledge about hidden relationships in the data (data mining). The problem with this approach is that data mining can only be performed after the warehouse has been created, a process which can take up to several years. The reasons that data warehousing is so time-consuming are ambiguity and distribution.
The ambiguity results from differences in the query languages and data formats of different databases, and may also be inherent in the information, for example, misspelling of names, or different names for the same street. This becomes all the more significant as the number of data sources proliferates. Consider, for example, the information reaching a television set as a stream of signals that need to be cataloged, indexed, and perhaps searched for interesting content at a higher level such as channel, programs, genre, or mood. Or consider the information that could be tracked about callers into a call center (for example, names, company, product or service they are calling about).
Distribution as in the way an organization""s data is spread across multiple databases creates a situation where it is difficult to obtain an organization-wide view on the data. Many relationships between the data which are crucial to organizational decision-making remain unknown or incomprehensible. To derive them it is necessary to integrate the data from various databases. Management of multiple databases on an organization-wide basis is commonly performed by a network management system. Within a network management system an agent may be located in a workstation or other management device to collect information locally and provide that information to requesting devices when required. The present invention relies on a network of agents to access data in distributed databases and provide to a network management device near real-time application specific information.
According to a first aspect of the present invention there is provided an agent-based system for creating and accessing a data warehouse comprising:
a network of interconnected distributed databases;
a user agent connected to the network for initiating an application specific request for data;
a plurality of functional agents for receiving goals from the user agent and for invoking processes for completing the goals; and
a plurality of resource agents, each associated with one of the distributed databases, for receiving and storing goals from the functional agents and obtaining application specific data from appropriate databases for use by the user agent.
According to a second aspect of the invention there is provided a method of generating an application specific data warehouse comprising:
providing a network of interconnected distributed databases;
providing a user agent connected to the network for entering a request for application specific data;
providing a plurality of functional agents for receiving goals from the user agent and for invoking processes for completing the goals; and
providing a plurality of resource agents associated with the databases for receiving and storing goals from the functional agents and obtaining application specific data from appropriated databases.