Computer users, both in the business and home settings, utilize many different types of computing devices to send electronic mail, generate work product, and process data. A plurality of computing devices, networks, and/or systems may be needed to process large amounts of data related to these and other computer uses. Programming models, such as MapReduce, may be used to process and generate these large amounts of data.
Utilizing a MapReduce process, a user may cause a map function to process and partition data, such as a key/value pair, to generate a set of partitioned data. The partitioned data may then be distributed to one or more nodes so that the partitioned data may be processed separately and independently. Subsequently, a reduce function may serve to merge the processed and partitioned data in order to output a merged data record.
There are at least two general assumptions when large data sets are processed utilizing a MapReduce process. First, the amount of data distributed to each of the one or more nodes will be approximately the same. Furthermore, the data input into the MapReduce process will be in key-value format, meaning that such data will take up about the same size in the memory of a particular computing device. However, when the data input into the MapReduce process is in a graphical form, the foregoing assumptions may no longer be valid.