Modern hardware computing platforms offer many new capabilities and capacities over older hardware. They are multi-core, capable of supporting very large amounts of shared memory, and are designed for multi-thread operation. Being able to fully utilize these features allows the ability to run very high performance, scalable, and real time applications on inexpensive hardware. Taking full advantage of these benefits requires the integration of the operating system functionality into the application.
In particular, multi-core hardware computing platforms have the potential of providing a real multiplier in computing power due to their multi-core nature and very high speed memory access due to the large caches dedicated to each of these cores. There are several significant challenges in realizing the full potential of this hardware. The primary ones involve how to avoid data cache thrashing, how to prevent data in shared memory from being updated by multiple processor cores simultaneously, and how to spread the load evenly over the available processors. A general purpose operating system has generic algorithms for all of the above which take no account of application specific behaviours. This results in sub-optimal utilization of the resources mentioned above.
Current practice for optimal use of available multi-core computer resources involves one of the following approaches. First, the application may be hard coded to directly control its own scheduling with a minimal executive for hardware access (i.e., integrating key operating system functionality into the application). Done properly, this can result in very efficient use of the multi-core hardware. However, such applications are difficult to program and have little flexibility. Developing them is expensive, time consuming, and prone to error as it requires programmers to develop the application directly for the underlying hardware architecture (something that is generally abstracted by the operating system).
Second, the application may be implemented by dividing it into application subsystems, each with its own data storage and executable. This is often done by a horizontal scaling technique whereby data is streamed between the different subsystems. This approach is quite common in event processing applications where memory can be segregated among the application subsystems. However, such segmentation avoids global access to shared memory resulting in data duplication, increased latency, and much, otherwise unnecessary, encoding and decoding of data for transfer between the different subsystems.
Third, the data may be divided horizontally into different memory pools with a different processor, each executing the same application, being responsible for each memory pool. This is reasonably efficient if the data can be broken down in that way, but requires additional processing to route requests to the correct “pool”, doesn't solve any issues with shared memory, and increases overall latency. Scaling is accomplished by breaking the data into more pools and adding more processors.
Fourth, some operating systems make use of “slab” allocators for memory management where memory allocation requests for identical sized chunks of memory are grouped into slabs. This allows a reduction of memory fragmentation and use of free lists for allocation. Unfortunately, such allocators are more likely to have cache thrashing as the active data is spread out over large regions of memory as a result of the data not being segregated with application specific knowledge of use or expected lifetime.
A need therefore exists for an improved data management method and system. Accordingly, a solution that addresses, at least in part, the above and other shortcomings is desired.