While these advantages can readily be achieved where sets of data to be processed by the processing units or nodes are independent, a problem arises when the result of processing in one or more nodes affects the processing operations taking place in another node or nodes.
If, for example, code is written to perform in parallel what has hitherto been an essentially serial operation, the dependencies between the various parallel operations may become extremely complex, to the point where the analysis of potential conflict can be prohibitively difficult. The code may require a great deal of testing and corrective procedures in order to deal with the synchronising messages which are necessarily exchanged between parallel strands of an application program.
Such a situation can arise in the analysis of the substantial quantities of data, extending to many millions of records, obtained in commercial activities such as insurance and retailing. Speedy analysis is essential if marketing strategy is to be effectively and flexibly matched to customer needs and preferences, and a number of processes have been developed to recognise marketing trends in order to respond effectively thereto.
In one such process, referred to as data mining, records are analysed serially to develop a model which can be used for predicting trends. The model is dynamically updated as the analysis proceeds, and each processing step is required to be taken in accordance with the model that is current. Data mining has accordingly been traditionally performed as a serial, single processor operation. This is necessarily a computationally intensive process, and considerable advantage could be gained by the application of parallel processing techniques to processes such as data mining.