There are many computational algorithms for processing segmented data, examples of which may include segmented reduction and sparse matrix vector multiplication (SpMV). These algorithms are typically implemented on massively parallel processors. Typically, the input data is divided and each portion of the data is allocated to a different processor. However, it is particularly difficult to balance the workload from the diverse datasets. For example, many real-world datasets include a majority of short or zero-length segments accompanied by a minority of segments having a length that is orders of magnitude larger than the other segments. Thus, many processors assigned short or zero-length segments will have very few computations to perform, while a few processors assigned to the longer segments will perform most of the work.
Contemporary parallel decomposition strategies are inadequate because data sets are typically divided based on a single component of the data. For example, a dataset in the form of a matrix may be divided by row or column. In another example, a dataset in the form of a number of variable length lists may be divided evenly by list index. These decomposition strategies fail to balance the workload based on, for example, the computational complexity of each of the segments assigned to each processor. Thus, there is a need for addressing these issues and/or other issues associated with the prior art.