An increasing number of companies and other enterprises are reducing their costs by migrating portions of their information technology infrastructure to cloud service providers. For example, virtual data centers and other types of systems comprising distributed virtual infrastructure are coming into widespread use. Commercially available virtualization software such as VMware® VSphere™ may be used by cloud service providers to build a variety of different types of virtual infrastructure, including private and public cloud computing and storage systems, which may be distributed across hundreds of interconnected computers, storage devices and other physical machines. Typical cloud service offerings include, for example, Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS).
In cloud-based information processing system arrangements of the type described above, a wide variety of different hardware and software products are often deployed, many of which may be from different vendors, resulting in a complex system configuration. As the complexity of such cloud infrastructure increases, the need for accurate and efficient management of data has also grown.
Conventional approaches to data management in cloud infrastructure and other types of complex information technology (IT) infrastructure are deficient in a number of respects. For example, many data management techniques take a fragmented or partial approach to handling issues such as data provenance, versioning, volatility, derivation, indexing, materialization and state. As a result, expressions such as policies, assertions, constraints and rules relating to the data are often neither visible nor accessible, and accordingly can be difficult to assess, enforce and audit. For example, expressions of this type may be hidden in procedural code and schedules, which are hard to change. This unduly limits the actions that can be taken, and may raise doubts about the validity of data analyses.
It is therefore often necessary to make assumptions regarding the data to be managed, which can be problematic. For example, optimistic assumptions are made in some cases (e.g., “let's assume the information is current”) while pessimistic ones are made in other cases (e.g., “there's an old timestamp on the file, so let's go back to the source instead”). Such assumptions may be inaccurate and can substantially undermine system performance when carrying out a variety of common data processing operations.