This invention relates to methods, apparatus and systems for monitoring the activity of nodes on a network, storing data relating to the activity of those nodes in a data warehouse, and providing access to the data through customizable queries. More specifically, this invention relates to methods and systems for transforming data obtained from nodes on a network prior to loading that data into a data warehouse.
Businesses can gain a competitive advantage by using and analyzing strategic data relating to their businesses. Analysis of such data allows managers to make better and more timely decisions, leads to a better understanding of the business, and improves support for customers, all of which ultimately lead to growth. However, basing decisions on data relating to a business requires storing tremendous volumes of data. For example, multi-national companies have networks with nodes located across the globe that generate megabytes of data on an hourly basis. More specifically, banks continuously generate megabytes of data relating to activities of Customer Activated Terminals (CATs), Automated Teller Machines (ATMs), or home service delivery systems, among other activities. Due to the tremendous volume of data which a business may generate during the course of a day, many businesses are opting to store business data in data warehouses.
A data warehouse is a storage and retrieval system capable of storing vast amounts of data. More specifically, a data warehouse is a subject oriented, integrated, time-variant, nonvolatile collection of data used to support business managers"" decision making process. Thus, data warehouses support informational (i.e., DSSxe2x80x94decision support) processing by providing a solid foundation of integrated corporate wide historical data from which to perform management analysis.
Data warehousing has become increasingly reachable, both economically and technically to many, if not most businesses. Large multi-national concerns that engage in literally millions of business transactions in a very short period of time have a need to store and view information relating to those transactions so that they can make decisions that will enhance their business. Just a few years ago, the massive database queries required for millions of business transactions taxed all but the world""s largest computers and database systems to the point of being unusable. Today, that is not at all the case. Specialized xe2x80x9cnichexe2x80x9d market relational database management system (RDBMS) engines for data warehousing have been developed and are readily available at low prices. Multi-processor server hardware machines are available for under the five-figure mark and massive storage devices are also spiraling downward in price.
A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. One of the most important aspects of the data warehouse environment is that data found with in the data warehouse is integrated. This integration requires that whatever the source or sources of the data that is eventually stored in the data warehouse, the data must arrive in the data warehouse in a consistent integrated state. Therefore, before data is loaded into a data warehouse, the data must be transformed to the data warehouse format. There after, there are normally only two kinds of operations that occur in the data warehouse the loading of data after it is transformed and the access of data.
Once data is loaded into a data warehouse, end users may access the data using a separate application program or through an interface provided as part of the database system software. Some refer to these as discovery tools. Discovery tools are used to retrieve, analyze and present data from data warehouses. Tools used can range from very complex modeling tools to relatively simple end user query tools designed to do no more than mask the complexity of the structured query language (SQL) programming language from the user. Automated tools that search data for trends or relationships may also be used.
Accordingly, there is a need for methods and systems to effectively and efficiently transform data obtained from nodes on a network. Specifically, there is a need to transform operational data into integrated data before uploading such data into a data warehouse.
There is an additional need to transform data obtained from customer activated terminals (CATs) networked together such that the data obtained is integrated prior to loading it into a data warehouse. There is also a need to transform and integrate operational and transaction data obtained from automated teller machines (ATMs) prior to uploading such data into a data warehouse.
There also is a need to transform transactional and/or operational data obtained from networks providing services to customers"" homes prior to loading such data into a data warehouse. More specifically there is a need to integrate data relating to transactions occurring on home banking servers prior to loading that data into a data warehouse.
It is an object of the invention to meet these needs, and others, through a method and system for warehousing data obtained from nodes on a network.
It is a further object of the present invention to provide business decision makers and managers the ability to better define their customer base, analyze trends, and better serve those having a relationship with the corporate entity.
It is a further object of the present invention to provide a system that can be deployed on many different machines and platforms.
It is another object of the present invention to provide a data warehouse that can store more than a terabyte of data, making full use of multi-processor computer and redundant arrays of inexpensive disk (RAID) technologies to deliver decision support data as fast as possible.
It is a further object of the present invention to provide a global product targeted to be deployed both domestically and internationally to serve as a window into a network so that such fundamental questions such as how CATs and/or ATMs are used and how to better serve home banking users can easily be answered.
It is yet another object of the present invention to provide a system and method for transforming data obtained from an operational environment so that it may be uploaded into a data warehouse.
It is a further object of the present invention to provide a system and method for transforming data obtained from nodes on a network before uploading the data into a data warehouse.
It is also an object of the present invention to provide a system and method for transforming operational and transactional data obtained from CATs on a network before uploading that data into a data warehouse.
It is a further object of the present invention to provide a system and method for transforming operational and transactional data obtained from ATMs connected to a network prior to loading the data into a data warehouse.
It is another object of the present invention to provided a system and method for transforming data obtained from servers providing access to services to customers at sites distant from the service provider prior to loading that data into a data warehouse. For example, it is an object of the present invention to provide a system and method for transforming and integrating data obtained from home banking servers prior to uploading such data into a data warehouse.
The present invention comprises a method and system for integrating operational data received from nodes on a network prior to loading the data into a data warehouse.
To achieve the stated and other objects of the present invention, as embodied and described below, the invention includes a method for preparing and uploading data into a data warehouse comprising the steps of: obtaining a set of data from nodes on a network relating to the operation of and transactions occurring on each node and the operation of each node component in elementized message format; storing the set of data obtained as a series of records; transmitting the stored set of data to a data warehouse processor; transforming the transmitted set of data into data base formatted records wherein said transforming step is comprised of: determining time zone information for data obtained from each node in the network; rejecting node data having invalid syntax; reporting rejected node data in an audit error log file; calculating the local time associated with data obtained from each node by referencing a time zone table; verifying the data associated with each node by referencing a mnemonic table containing the location of individual node devices by number, name and mnemonic; determining whether any data relating to a transaction is an orphan; computing the total elapsed time for each transaction; writing the transformed data into an output file comprised of records; auditing the transformed data contained in the output file wherein said transformed data auditing step is comprised of: verifying the existence of templates, an audit initialization file, and the data warehouse calculating the number of records contained in the transformed data; determining the beginning and end times for the set of data obtained from the nodes on the network; determining a load control key by querying the data warehouse for the previous load control key and incrementing the result of the inquiry by one unit; querying the data warehouse to determine whether records currently being audited have previously been uploaded to the data warehouse; building a load control table management utility containing all of the instructions necessary for undertaking the current data base load; assigning a unique identification number to each record in the transformed set of data; building a node table management utility for loading data into the data warehouse and associating a load identification number with the transformed set of data; providing an error notification if a record in the transformed set of data was previously loaded into the data warehouse; loading the unique identification number assigned to each record in the transformed set of data and the transformed data into the data warehouse; generating records reporting the availability of nodes on the network and node components wherein said generating step is comprised of: obtaining the previous status of nodes and node components; comparing the current status of nodes and node components with their previous status; determining whether the current status of each node in the network or each node component has changed from the previous status of the node or node component; determining the length of time each node and each node component has been in its current state; forming an output file containing the current status of each node component, whether the current state of each node component is different from the status recorded for that node during a previous upload and how long the node component has been in its current state; auditing records reporting the availability of nodes on the network and node components wherein said record availability auditing step is comprised of: counting the number of records reporting the availability of nodes on the network and node components; counting the number of node components that have changed status from their previous states; determining the earliest and latest time for the set of records reporting the availability of nodes on the network and node components; assigning a unique load identification number to the number of records reporting node availability, the number of nodes having changed status and the earliest and latest time for the set of records reporting node availability; producing a load control table management utility providing a set of instructions for loading audit information concerning the records reporting the availability of nodes on the network and node components; producing a load control table management utility providing a set of instructions for loading the records reporting the availability of nodes on the network and node components; and loading the number of records reporting node availability, the number of nodes having changed status, and the earliest and latest time for the set of records reporting node availability and the load control identification number to the data warehouse.
To achieve the stated and other objects of the present invention, as embodied and described below, the invention includes a system for preparing operational data for upload to a data warehouse comprising: an integrated network control computer connected to a network having nodes processing transactions for retrieving and storing data relating to transactions occurring on the nodes; and a data warehouse connected to the integrated network control computer; said data warehouse having a data processor for receiving, transforming, and auditing the data relating to transactions occurring on the nodes.
Additional objects, advantages and novel features of the invention will be set forth in part in the description that follows, and in part will become more apparent to those skilled in the art upon examination of the following or upon learning by practice of the invention.