1. Field
Embodiments of the invention relate to a framework for Extremely Large Complex Objects (XLCOs).
2. Description of the Related Art
FIG. 1 illustrates, in a block diagram, normal file system access in the prior art. In FIG. 1, application code 100 of an application is accessing a “normal” file system 120. In FIG. 1, the application code 100 (e.g., written in the C programming language or other programming language) includes a file access request. That is, the file access request is embedded in the application code 100. It may also be said that the application code 100 utilizes a language specific file access method to interface with the underlying Operating System file system in order to perform file access. Generally, there are Open and Close file routines between which a variety of data Read and Write routines are executed. For example, the file access request may be OpenFile(filename) or getBytes(fileHandle, startLocation, numBytes). The file access request is issued to an Operating System (OS)/file system 120, which executes the file access request against disk 122. That is, the OS Input/Output (I/O) subsystem handles the actual reading and writing of data to the disk 122.
Over the years, computer applications have grown in both size and complexity. Older applications typically accessed datasets composed of flat files. These datasets were small in size and could be processed using a few Central Processing Unit (CPU) cycles and a limited amount of memory. Modern application requirements demand more storage, processing, and memory requirements. These modern applications perform functions, such as document processing, report archiving, and processing of image, video and voice. As applications become more advanced, there is a need to combine data formats (also referred to as data types) of multiple types into a single file.
The ability to process extremely large files is constrained by file system limitations or the processing program limitations, whichever is the smaller. Current file system limitations are typically in the multiple Gigabyte (Gbyte) range, while program processing limitations are often constrained to the amount of data that can be held in memory at one time, typically in the single Gbyte range.
The need for even more data storage and processing capacity will continue to grow as larger datasets (e.g., from multi-dimensional, multi-formatted, time variant and data-streaming applications) start to take hold. An example of a data-streaming problem is: “How do you store video data from all the video sources in a city and how can this data later be efficiently searched using image recognition software to identify and track a specific individual?”. A current solution to this problem is to process the video data as the video data is being captured and, then, to either discard the video data or store a subset of the total video data stream. This may or may not be sufficient depending on the nature of the problem at hand. In the above example, if one is tracking an individual in real time, then it is sufficient to know where the person is and to “guess” as to where the person is going. On the other hand, if one is trying to back track an individual's movement over time, then this is not possible unless the data has been stored somewhere in a fashion that allows the data to be retrieved and processed efficiently.
Current general solutions to the large file problem involves taking advantage of multiple computer system enhancements:
1. Increasing the number of bits used by the OS to determine file sizes.
2. Increasing the number of available processors (number of processors and number of CPUs within a processor) contained within a computer system.
3. Increasing the amount of memory available to the computer system.
These computer system enhancements, while providing improvements in computer storage and processing capacity are incapable of providing a general solution to the problem due to a number of shortcomings.
With reference to increasing the number of bits used by the OS to determine file sizes, there are a number of shortcomings. For example, there may be a need to store a file larger than the existing file size limit. Historically, this has been true and there is no reason to believe that this will not be true in the foreseeable future. Also, not all files need to be extremely large. Many files are in the Kilobyte (Kbyte) or Megabyte (Mbyte) size by nature of their content. So using large numbers of bits to determine the file size imposes file system penalties for the smaller files. In addition, users will have to purchase “newer” OSs and the computer systems they run on in order to support the larger file sizes. Moreover, in conventional systems, these large files can only contain a single data format. Also, the file size is limited by the size of the physical storage device. Furthermore, it would be hard to create “larger systems/processing complexes” composed of multiple systems if all of the multiple systems do not support the same OS constraints.
With reference to increasing the number of available processors contained within the computer system, there are a number of shortcomings. For example, to take advantage of multiple processor systems, the OS must support multiprocessor processing, either in the form of multi-threading or a specialty language that allow an application to address large numbers of processors (sometimes referred to as “cores”) directly. Also, the disadvantage of the specialty languages is that the applications become very system dependant and, thus, hard and expensive to maintain. Moreover, the disadvantage of multi-threading is that it is not scalable across multiple computer systems. In addition, the file cannot be processed by multiple threads or multiple applications (on different systems) in parallel.
With reference to increasing the amount of memory available to the computer system, there are a number of shortcomings. For example, there will always be a need for more real memory in a system. Also, processing of a large file cannot be spread over multiple systems.