The present disclosure relates to techniques for managing data in computing systems, and more specifically, to techniques for orchestrating data into definitions to aid processing of large data sets in computing systems.
Today, many industries rely on high-end computing systems, such as mainframes, etc., to process high volume data, support transaction-driven applications, and run their most critical processes. Examples of such industries include those in banking, finance, health care, insurance, public utilities, government, and a multitude of other public and private enterprises. In general, these computing systems are capable of performing large-scale transaction processing (e.g., thousands of transactions per second, etc.), supporting thousands of users and application programs, managing large amounts of information stored in databases, handling large-bandwidth communication, etc.
Most workloads on these high-end computing systems generally fall into one of two categories: batch processing or on-line processing. Batch processing, in general, refers to the execution of a number of jobs (or workloads) on the computing system without user interaction. Typically, with batch processing, high volumes of data are collected, stored for a period of time, and subsequently processed at a later time period without any intervention from a user. Examples of batch processing jobs include payroll and billing systems, updating inventory systems, scheduled database backups and the like. On-line processing, on the other hand, generally refers to processing that occurs interactively with a user. For example, on-line processing is typically characterized by high volume short interactions between several (e.g., thousands) users and the computing system, with an immediate (real-time) response generally required for each interaction. Examples of on-line processing include automated teller machine (ATM) transactions (e.g., deposits, withdrawals, inquiries, transfers, etc.), debit/credit card payments, airline reservation systems, inventory control and production scheduling, etc.
Due, in part, to the explosion in both the amount of data used in computing systems and in the amount of applications that perform analytics (processing) on the data, computing systems have faced challenges with providing real time data processing and analytics. Accordingly, there is a need for improved techniques for processing large volumes of data within these computing systems.