Online transaction processing (OLTP) and online analytical processing (OLAP) are two types of database systems. OLTP system is used to manage and process transactions. Typical examples of such transaction processing systems are sales order entry or banking transaction processing system. OLAP system is used to analyze data to generate reports for business analysts. Typical reports include aggregated sales statistics grouped by geographical regions, or by product categories, or by customer classifications, etc.
Initial attempts to execute OLAP queries on the operational OLTP database were dismissed as the OLAP query processing led to resource contentions and severely hurt the mission-critical transaction processing. Therefore, the data staging architecture was devised where the transaction processing is carried out on a dedicated OLTP database system. In addition, a separate data warehouse system is installed for OLAP query processing. Periodically, e.g., during the night, the OLTP database changes are extracted, transformed to the layout of the data warehouse schema, and loaded into the data warehouse. This data staging and its associated ETL (Extract-Transform-Load) obviously incurs the problem of data staleness as the ETL process can only be executed periodically.
Real-time/operational business intelligence demands to execute OLAP queries on the current, up-to-date state of the transactional OLTP data. As a solution, an existing hybrid system having a main-memory database is proposed to handle both OLTP and OLAP simultaneously by using hardware-assisted replication mechanisms to maintain consistent snapshots of the transactional data. The system executes OLAP query sessions on the same, arbitrarily current and consistent snapshot. These snapshots are created by forking the OLTP process and thereby creating a consistent virtual memory snapshot. The system allows for arbitrarily current snapshots by periodically forking a new snapshot and thus starting a new OLAP query session process.
Even though the existing hybrid system look promising, they too may have technical problems. Forking of a large process typically consumes time in the order of milli-seconds. This is because of a large number of page table entries (PTEs) should be replicated. As per one reference 384M of data contains about 100K pages. So the periodic forking will impact the performance of online data processing. In addition, forking does a big bang copy of the PETs and does not optimize the copy to the small delta that could have changed.