The present invention relates generally to a coordinated version control system, and more particularly, but not by way of limitation, to a coordinated version control system for using a rule-based approach to facilitate asynchronous execution, introducing a server side coordinate version control to achieve version consistency, and introducing a learner side version enhancement to further enforce the version consistency.
Many existing machine learning algorithms partition the parameters into multiple disjoint parameter sets and ask distributed parameter servers to aggregate different sets. Meanwhile, contemporary parameter servers adopt asynchronous execution to tolerate slow learners.
Conventionally, in asynchronous execution (on a server side), the constraint that aggregation only happens after all parameters are collected is relaxed. Instead, server can carry out aggregation when a proportion of parameters are received, then send back to clients. Also, in asynchronous execution (on a client side), a learner can continue training without waiting for the arrival of the latest aggregated parameters.
However, there is a technical problem in the conventional techniques that, as a result of this combined design of the server side and the client side, the conventional techniques may create undesirable mismatches between the versions of aggregated parameter sets returned by different servers. Such an issue can lead to slow convergence, or even a complete inability to converge in certain cases.