Data processing systems are known to perform a myriad of useful functions. A data processing system typically comprises a computer system operating one or more software programs. The computer system accesses a data source for the data which it processes. In some cases, the data is stored locally on the computer system. In other cases, the data is stored remotely, e.g., on a database server accessed through a network.
Large retailers have an incentive and desire to leverage information gathered about their business transactions and customer relationships. Retailers want to use data to maximize profit, control inventory, secure market share, and achieve a variety of other business goals. One tool which makes use of such data is the creation of economic models based on the data. The utility of the models is proportioned to, and limited by, the amount of data that can be processed in a given time. Therefore, it is useful to have systems that process transaction and customer data as efficiently and rapidly as possible.
The software operating on the computer system can perform many useful functions depending on the intended application. In one example, the data processing system may perform economic and financial modeling and planning, which, given specific sets of input data of interest, is commonly used to estimate or predict the performance and outcome of real systems. An economic-based system will have many variables and influences which determine its behavior. A model is a mathematical expression or representation which predicts the outcome or behavior of the system under a variety of conditions. In one sense, it is relatively easy to review historical data, understand its past performance, and state with relative certainty that the system's past behavior was indeed driven by the historical data. A much more difficult task, but one that is extremely valuable, is to generate a mathematical model of the system which predicts how the system will behave, or would have behaved, with different sets of data and assumptions. While forecasting and backcasting using different sets of input data is inherently imprecise, i.e., no model can achieve 100% certainty, the field of probability and statistics has provided many tools which allow such predictions to be made with reasonable certainty and acceptable levels of confidence.
In its basic form, the economic model can be viewed as a predicted or anticipated outcome of a mathematical expression, as driven by a given set of input data and assumptions. The input data is processed through the mathematical expression representing either the expected or current behavior of the real system. The mathematical expression is formulated or derived from principles of probability and statistics, often by analyzing historical data and corresponding known outcomes, to achieve a best fit of the expected behavior of the system to other sets of data, both in terms of forecasting and backcasting. In other words, the model should be able to predict the outcome or response of the system to a specific set of data being considered or proposed, within a level of confidence, or an acceptable level of uncertainty. As a simple test of the quality of the model, if historical data is processed through the model and the prediction of the model, using the historical data, is closely aligned with the known historical outcome, then the model is considered to have a high confidence level over the interval. The model should then do a good job of forecasting outcomes of the system to different sets of input data.
For the economic model to perform up to expectations, it must have access to real data. In the retail business, the raw data typically comes from the actual retail transactions. FIG. 1 illustrates such a retail data gathering and economic modeling system. Retail store 12 sells merchandise and/or services. The sales data may come in the form of stock keeping unit (SKU) data read from the universal product code (UPC) label or barcode associated with the product. In the case of a grocery store, when the food item is passed over the check-out scanner, the UPC label is read and the product is identified. The store's computer system retrieves a significant amount of information associated with the product, including price for the check-out process.
The data associated with the sale of products in the customer transaction is recorded and sent to data storage system 14 by way of communication channel 16. Data storage system 14 may be a mass storage device, or a server connected to a mass storage device, which contains a relational database or other file structure convenient for storing large amounts of data. The mass storage device can be magnetic or optical disk drive(s). Retail store 12 may conduct many thousands of transactions each day, each transaction potentially involving many products, and each product having 30 or more data fields. The raw data generated by retail store 12 over time can be massive. Data storage system 14 may be required to store data for many retail stores, and if the data storage system is operated by an independent data processing vendor, it may have to store data for many different retailers and other business clients.
The economic model executes by way of software or computer programs running on computer system 18. The modeling software accesses data, processes the data, and generates reports, model parameters, or otherwise makes recommendations. To access the needed data, the modeling software operating on computer system 18 sends requests and receives data from data storage system 14 over communication channel 20. The data storage system 14 accesses the requested data and transmits the data over communication channel 20 back to computer system 18. The computer system 18 processes the retrieved data according to the algorithms of the economic model.
The data processing system has certain data throughput limitations or bottlenecks which slow down the execution of the economic model and generation of the reports and predicted outcomes of the system. The bottlenecks occur with communication channel 20 and the data access from data storage system 14.
The total raw dataset from retail store 12 is commonly stored in a relational database or other formal file structure. The modeling application running on computer system 18 makes requests for specific data components of the overall dataset in data storage system 14 via communication channel 20. Data storage system 14 identifies the storage location, retrieves the data from the storage location, and sends the requested data over communication channel 20 back to computer system 18.
One bottleneck arises from the large number of requests for data components that must be processed by data storage system 14. The central processing unit (CPU) on data storage system 14 must make requests into the relational database file structure to retrieve each data component or segment. These database accesses take time to execute. In general, computer system 18 is capable of processing data more rapidly than data storage system 14 can locate and retrieve the data. A significant portion of the delays in the software execution can be attributed to database acquisition latency.
Another bottleneck arises from the massive amounts of data traversing communication channel 20. Communication channel 20 has a finite bandwidth and can transmit only so much data over a given time frame. The problem becomes acute when multiple modeling applications running on computer system 18 all try to access their data over communication channel 20 within a common time frame. The issue is even more apparent when multiple applications running on multiple computer systems like 18 all try to use communication channel 20 simultaneously.
The massive amounts of data being accessed from the relational database or other formal file structure on data storage system 14, and then being transferred over communication channel 20 to feed the multiple economic modeling applications, slows down the software execution and reduces the model efficiency. The increased execution time of the software translates into higher costs of operating the computer system and generating the needed model outputs. In practice, the users are often limited in the frequency that they can run the economic modeling applications because of the time and cost involved in accessing and processing the data.
A need exists for an efficient data storage and acquisition process to run high data throughput processing applications such as economic modeling.