There is a growing use of machine learning (ML) algorithms on datasets to extract and analyze information. As datasets grow in size for applications such as topic modeling, recommender systems, and internet search queries, there is a need for scalable implementations of ML algorithms on large datasets. Present implementations of ML algorithms require manual tuning on specialized hardware, and methods to parallelize individual learning algorithms on a cluster of machines must be manually implemented.
Parallel processing is used to increase speed of execution and amounts of data to be processed. However, using a distributed network or plurality of processors means there will exist larger plurality of possible execution strategies for a job. One problem is that selecting a good execution strategy from the plurality, especially for implementing a plurality of ML algorithms, falls on the programmer.