The present invention generally relates to data processing systems for use in computer systems, and more particularly to systems capable of performing big data analytics as well as devices therefor.
Big data analytics is a relatively new approach to managing large amounts of data. As used herein, the term “big data” is used to describe unstructured and semi-structured data in such large volumes (for example, petabytes or exabytes of data) as to be immensely cumbersome to load into a relational database for analysis. Instead of the conventional approach of extracting information from data sets, where an operator defines criteria that are used for data analysis, big data analytics refers to a process by which the data themselves are used to generate their own search strategies based on commonalities of events, for example recurrent data structures or abnormal events, that is, unique data structures that do not match the rest of the data set. One of the prerequisites for this kind of data-driven analysis is to have data sets that are as large as possible, which in turn means that they need to be processed in the most efficient way. In most cases, the analysis involves massive parallel processing as done, for example, on a graphics processing unit (GPU). The “general purpose” type of the work load performed by a GPU has led to the term “general purpose graphics processing unit” or “GPGPU” for the processor and “GPGPU computing” for this type of computational analysis with a GPU.
Big data analytics has become the method of choice in fields like astronomy where no experimental intervention can be applied to preselect data. Rather, data are accumulated and analyzed essentially without applying any kind of filtering. Another exemplary case underscoring the importance of the emergence of big data analytics has been a study of breast cancer survivors with a somewhat surprising outcome of the study, in that the phenotypical expression and configuration of non-cancerous stromal cells was equally or even more deterministic for the survival rate of patients than the actual characteristics of the tumor cells. Interestingly, attention had not been paid to the first until a big data analytics going far beyond the immediate focus of the study was applied, in which, without preselection by an operator, all available data were loaded into the system for analysis. This example illustrates how seemingly unrelated data can hold clues to solving complex problems, and underscores the need to feed the processing units with data sets that are as complete and all-encompassing as possible, without applying preselection or bias of any sort.
The lack of bias or preselection further underpins that the data sets used in big data analytics are exactly what the name describes, meaning that data sets in excess of terabytes are not the exception but rather the norm. Conventional computer systems are not designed to digest data on massive scales for a number of reasons. General purpose central processing units (CPUs) are very good at performing a highly diverse workload, but the limitation in the number of cores, which determines the number of possible concurrent threads (including Intel's HyperThreading), prevents CPUs from being very good at massive parallel analytics of large data. For this reason, GPUs characterized by a large array of special purpose processors have been adapted to perform general purpose computing, leading to the evolution of GPGPUs. However, even with the highest-end GPGPU expansion cards currently available, for example, the Tesla series of graphics expansion cards commercially available from nVidia Corporation, the on-board (local) volatile memory (referred to as a local frame buffer, or LFB) functionally integrated with the GPGPU on the graphics expansion card is limited to 6 GB, which can only hold a fraction of the data designated to be analyzed in any given scenario. Moreover, the data need to be loaded from a host system (for example, a personal computer or server) through a PCIe (peripheral component interconnect express, or PCI Express) root complex, which typically involves access of the data through a hard disk drive or, in a more advanced configuration, through NAND flash-based solid state drives (SSDs), which receive data from a larger storage array in the back-end of a server array. Either type of drive will read the data out to the main system memory which, in turn, through a direct memory access (DMA) channel forwards the data to the LFB. While functional, this process has drawbacks in the form of multiple protocol and data format conversions and many hops from one station to another within the computer system, adding latencies and potential bus congestion. In other words, the current challenge in systems used to perform big data analytics is that their performance is no longer defined by the computational resources but rather by the I/O limitations of the systems.
Another difference compared to current mainstream computing is that the data made available to GPGPUs are often not modified. Instead they are loaded and the computational analysis generates a new set of data in the form of additional paradigms or parameters that can be applied against specific aspects or the whole of the original data set. However, the original data are not changed since they are the reference and may be needed at any later time again. This changes the prerequisites for SSDs serving as last tier storage media before the data are loaded into a volatile memory buffer. Specifically with respect to loading the data into the SSD, most of the transactions will be sequential writes of large files, whereas small, random access writes could be negligible. In the case of data reads to the LFB, a mixed load of data comprising large sequential transfers and smaller transfers with a more random access pattern are probably the most realistic scenario.
As previously noted, a particular characteristic of big data analytics is its unstructured or semi-structured nature of information. Unlike structured information, which as used herein refers to relational database ordered in records and arranged in a format that database software can easily process, big data information is typically in the form of raw sets of mixed objects, for example, MRI images, outputs of multiple sensors, video clips, and so on. Each object contains a data part, e.g., a bitmap of the MRI image, and a metadata part, e.g., description of the MRI image, information about the patient, MRI type, and diagnosis.
The massive amount of data gathered and subjected to analytics typically requires a distributed processing scheme. That is, the data are stored in different nodes. However, each node in the system can process data from any other node. In other words, the storage is accumulated within the nodes' capacity and the processing power is spread across all nodes, forming a large space of parallel processing.
Funneling all data through the PCIe root complex of a host system may eventually result in bus contention and delays in data access. Specifically, in most current approaches, data are read from a solid state drive to the volatile system memory, then copied to a second location in the system memory pinned to the GPU, and finally transferred via the PCIe root complex to the graphics expansion card where the data are stored in the LFB. Alternatively, a peer-to-peer data transfer can be used to transfer data directly from one device to another but it still has to pass through the PCIe root complex. Similar constraints are found in modern gaming applications where texture maps are pushing the boundaries of the LFB of gaming graphics expansion cards. US patent application 2011/0292058 discloses a non-volatile memory space assigned to an Intel Larrabee (LRB)-type graphics processor for fast access of texture data from the SSD as well as a method for detection whether the requested data are in the non-volatile memory and then arbitrating the access accordingly.
Given the complexity and lack of optimization of the above discussed data transfer scheme between non-volatile storage and the local on-board volatile memory of a graphics expansion card, including all latencies and possible contentions at any of the hops between the origin in the SSD and the final destination in the LFB, it is clear that more efficient storage and processing systems are needed for performing big data analytics.