Petascale analytics require an efficient map reduce framework that can leverage advances in processor technologies to improve the performance/cost ratio. Communication bottlenecks are a major roadblock for high throughput map-reduce over large quantities of data (for example, terabytes to petabytes). Currently, map-reduce frameworks suffer from such communication bottlenecks. Further, existing map-reduce frameworks are not able to leverage hybrid systems that provide accelerators. Additionally, the scheduling of the map reduce system is primarily centralized. Accordingly, there exists a need for distributed map reduce over large clusters.