In parallel computing, one of the keys to fast and effective processing is the ability of multiple compute elements to access and process multiple data sets stored in multiple memory modules. Such accessing and processing must be performed in parallel, simultaneously, in order to maximize the effective processing rate of the system. However, achieving simultaneous performance would require complex programming of the various compute elements and data managing elements. Needed are systems and methods to reduce the complexity of programming such systems.