Big data can be defined as any data that is too large, too complete, and/or too expensive to process using existing technologies and architectures.
Conventional parallel processing approaches have utilized a threaded architecture in an attempt to achieve processing power scaling. However, this approach has only proven to be somewhat useful, because threaded architectures often share all resources such as memory, I/O, disk resources, CPU resources, and other system resources. Given this sharing, threads need to be carefully managed. This management often means that “parallel” threads are not truly asynchronous or independently parallel. Left unmanaged, a shared-but-threaded architecture can result in a competition for resources between threads. This competition can result in issues such as thread locking, racing, and blocking, among other issues. Even in instances with adequate CPU bandwidth, these issues can cause bottlenecks, artificial delays, and/or the overall sub-optimization of resources.
It is challenging and difficult to design systems for processing in a parallel fashion and a high degree of flexibility.