Embodiments disclosed herein relate to computer software which transparently enforces policies in distributed processing infrastructures.
Today, for large-scale data processing systems supporting cloud computing use cases, distributed file systems such as Hadoop have been proposed. While Hadoop-based systems provide distributed file system capabilities with a decentralized architecture allowing superior levels of business resiliency even if entire racks of server and storage systems become unavailable due to network connectivity loss, hardware failure, or a disaster, Hadoop solutions (and distributed computing solutions in general) are unable to support policy-driven service level agreements for external customers in a transparent manner.
Hadoop is a software framework that supports data-intensive distributed applications. Hadoop enables applications to work with thousands of computational independent computers and petabytes of data. The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable filesystem for the Hadoop framework. A large Hadoop cluster may include a dedicated name node which hosts a filesystem index to manage the HDFS, as well as multiple data nodes which may store data and perform operations on the data. Today, Hadoop and other distributed processing infrastructures assume that all data nodes in their systems have the same characteristics. (Hadoop is a trademark of the Apache Software Foundation.)