Episode mining or frequent pattern mining is a useful tool in various data-intensive services, such as human resource services, financial services and information technology services. Episode mining may consume significant computing resources to analyze large amount of data, particularly if the data must be analyzed within a time constraint or when requested to be performed substantially in real-time.
Infrastructure providers often have enough processing capacity to perform such data-intensive services in their own private computing clouds, but some applications with heavy logging activity, such as call centers, business chat operations, and web logs, may be hard to parallelize in real-time for analytics purposes. In such cases, using a plurality of parallelized execution environments may increase performance by distributing processing.
However, conventional methods of load balancing do not provide adequate performance. In particular, conventional load balancing techniques typically assign a substantially equal amount of data to each resource or mainly rely on the resources' data computing capacities to determine the amount of data to each resource. Moreover, existing load balancing systems cannot guarantee (and do not even attempt in many cases) that computing sub-components finish substantially in unison. As such, the slowest resource ends up lengthening the make-span for the computation.