Apache Hadoop is a software framework that supports data-intensive, distributed applications. (Note: the term(s) “APACHE” and/or “HADOOP” may be subject to trademark rights in various jurisdictions throughout the world and are used here only in reference to the products or services properly denominated by the marks to the extent that such trademark rights may exist.) Apache Hadoop supports execution of applications on large clusters of commodity hardware. The Apache Hadoop framework: (i) allegedly can provide data motion to applications; and (ii) implements a computational paradigm named MapReduce, where an application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes. Apache Hadoop allegedly enables applications to work with thousands of computation-independent computers and quantities of data in the petabyte range. The entire Apache Hadoop platform is now commonly considered to include: (i) the Hadoop kernel; (ii) MapReduce; and (iii) Hadoop Distributed File System (HDFS). In this document, software frameworks that support distributed applications will generically be referred to as “distributed application support frameworks” (DASFs).
MapReduce is a programming model for processing large data sets. MapReduce is typically used to do distributed computing on clusters of computers. MapReduce allegedly provides regular programmers the ability to produce parallel distributed programs, by writing relatively simple Map( ) and Reduce( ) functions, which focus on the logic of the specific problem at hand. The MapReduce framework automatically takes care of: (i) marshalling the distributed servers; (ii) running the various tasks in parallel; (iii) managing all communications and data transfers between the various parts of the system; (iv) providing for redundancy and failures; and (v) overall management of the whole process. MapReduce libraries have been written in many programming languages, with a popular implementation being the above-mentioned Apache Hadoop. In this document, MapReduce (also possibly referred to as map reduce, map/reduce, or MR) shall generically refer to any programming model (now known or to be developed in the future) for processing large data sets that use Map( ) and Reduce( ) functions.