There is growing regulatory and competitive pressure on various industries to improve the quality, consistency, and availability of reported data. Storage and processing demands are increasing along multiple dimensions such as granularity, online history, redundancy, and collections for joining together new combinations of data. In addition, intra-day versioning is becoming necessary for managing discrepancies between departments with different timing needs as data is increasingly shared across departments within a company. Departments also are starting to look for the road that will take them from batch processing to incremental real-time and stream data management.
While demand for efficient and consistent data management is growing, many large companies are replacing failing ACID (Atomicity, Consistency, Isolation, and Durability) architecture with scalable BASE architecture. Solutions to view and analyze large to huge datasets are becoming commonplace as these companies release aspects of their cloud-scaling systems to open source. While hyper-scale analysis engines are becoming commonplace, tools to manage movement of data sets have not kept pace. Large companies are scrambling to protect themselves from growing likelihood of outages because they lack means to manage the availability of large data streams.
Many other companies face the same inability to replicate growing data sets. ACID architectures are costly, complex, and wrong for ensuring that data is consistent and available across space and time (e.g., department data sharing and forensics). A higher bar for availability, consistency, and governance of these growing data sets is consistently being set.