An increasing number of companies and other enterprises are reducing their costs by migrating portions of their information technology infrastructure to cloud service providers. For example, virtual data centers and other types of systems comprising distributed virtual infrastructure are coming into widespread use. Commercially available virtualization software such as VMware® vSphere™ may be used by cloud service providers to build a variety of different types of virtual infrastructure, including private and public cloud computing and storage systems, which may be distributed across hundreds of interconnected computers, storage devices and other physical machines. Typical cloud service offerings include, for example, Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS).
In cloud-based information processing system arrangements of the type described above, a wide variety of different hardware and software products are often deployed, many of which may be from different vendors, resulting in a complex system configuration. As the complexity of such cloud infrastructure increases, the need for accurate and efficient processing of data has also grown.
Existing approaches to information assembly take an inflexible approach to handling associated processes. For example, such approaches generally do not consider issues of data set provenance, versioning, volatility, derivation, indexing, materialization, and state, with respect to their process implications and remediation of issues. Assertions, rules and constraints governing processes are generally neither visible nor assessable.
From an information assembly perspective, there is no unified description or repository for metadata on data sets, no explicit representation of such metadata that allows reasoning or recommendations, and no easy way to assess assertions about data sets used in information assembly for purpose. This combination limits the actions that can be taken, causes process errors, and raises doubts about the validity of process outcomes. Former approaches may make optimistic assumptions in some cases (“let's assume the usual information was fine”) and pessimistic ones in other cases (“there's an input file missing, so let's abort the process”). Such assumptions may be inaccurate and can substantially undermine system performance when carrying out a variety of different processing operations.