In the computing industry, it is common to store data in commercial database systems and to retrieve such data using a database management system. There are two general types of database management systems, relational and object-oriented. Relational database systems are good for managing large amounts of data, while object-oriented database systems are good for expressing complex relationships among objects. Relational database systems are good for data retrieval but provide little or no support for data manipulation, while object-oriented systems excel at data manipulation but provide little or no support for data query and retrieval. Depending on the task at hand, there are various systems available which are suited for particular tasks. In order to manage simple data without queries, a traditional database system is not even necessary. A simple file system will suffice. In order to manage simple data with queries, a relational database system would be ideal. If the subject data is complex without queries, an object-oriented database system would be used. Finally, for the case where the subject data is complex and requires query capabilities, one would want to use an object-relational database management system.
Attempts to combine the inherent functionalities have been made; however, as the two models are fundamentally different, integrating the two is quite difficult. Relational database systems are based on two-dimensional tables in which each item appears in a row. Relationships among the data are expressed by comparing the values stored in these tables. The object model is based on the tight integration of code and data, flexible data types, and references. Object-oriented databases have their origins in the object-oriented programming paradigm. In this paradigm, users are concerned with objects and operations on those objects. For example, instead of having to thing of a "DEPT tuple" plus a collection of corresponding "EMP tuples" that include "foreign key values" that reference the "primary key value" in that "DEPT tuple," the user should be able to think directly of a department object that contains a corresponding set of employee objects.
It is a basic tenet of the OO approach that everything is an object. Some objects are primitive and immutable (integers, strings, etc.) Other objects--typically user-created--are more complex and mutable. These more complex objects correspond to variables of arbitrary internal complexity. Every object has a type (the OO term is a class). Individual objects are sometimes referred to as object instances. An intrinsic aspect of any given type consists of the set of operators or functions (the OO term is "methods") that can be applied to objects of that type. Furthermore, all objects are encapsulated. This means that the representation or internal structure of a given object is not visible to the users of that object. Instead, users know only that the object is capable of performing certain functions. The advantage of encapsulation is that it allows the internal representation of objects to be changed without requiring any of the applications that use those objects to be rewritten. In other words, encapsulation implied data independence. Every object has a unique identity called the OID (object ID) which can be used as addresses for pointers from other parts of the database.
Relational database systems support a small, fixed collection of data types (e.g. integers, dates, and strings), which has proven adequate for traditional application domains such as administrative data processing. In many application domains, however, much more complex kinds of data must be handled. Typically, this complex data has been stored in Operating System file systems or specialized data structures, rather than in a DBMS. Examples of domains with complex data include computer aided design and modeling (CAD/CAM), multimedia repositories, and document management. As the amount of data grows, the many features offered by a DBMS for data management--for example, reduced application development time, concurrency control and recovery, indexing support, and query capabilities--become increasingly attractive and necessary. In order to support such applications, a DBMS must support complex data types. Object-oriented concepts have strongly influenced efforts to enhance database support for complex data. As mentioned before, there exist, in the prior art, relational database management systems which support these functions with regard to simple data. A relational DBMS could conceivably store complex data types. For example, images, videos, etc. could be stored as blobs ("basic large objects") in current relational systems. A blob is just a long stream of bytes, and the DBMS's support consists of storing and retrieving blobs in such a manner that a user does not have to worry about the size of the blob; a blob can span several pages. All further processing of the blob has to be done by the user's application program, in the host language in which the SQL language is embedded. This solution is not efficient because we are forced to retrieve all the blobs in a collection even if most of them could be filtered out of the answer by applying a user-defined function within the DBMS. Although object-oriented databases of the prior art support storage of complex data, they fail to provide the query and indexing capabilities to manage such complex data. There is a need for a database management system which can provide the features and functionality of traditional relational database management systems, but for use with complex data types. As a result of this need, there has been a drive towards the development of object-relational database management systems.
Object relational databases can be thought of as an attempt to extend relational database systems with the functionality necessary to support a broader class of applications, and in many ways, provide a bridge between relational and object-oriented paradigms. There are several object-relational database management systems (ORDBMSs) in the market today. These include the Informix Universal Server, UniSQL, and 02. The approach taken by the current trends in object relational technology is to extend the functionality of A existing relational DBMSs by adding new data types. Traditional systems offered limited flexibility in the data types available. Data is stored in tables, and the type of each field value is limited to be a simple atomic type. This limited type system has been extended in three ways: user-defined abstract types, constructed types, and reference types. Collectively, these are referred to as complex types. As an example, take a JPEG image. This type is not one of a typical DBMS's built-in types, but can be defined by a user in an ORDBMS, to store image data compressed using the JPEG standard. Allowing users to define arbitrary new data types is a key feature of ORDBMSs. The ORDBMS allows users to store and retrieve objects of type jpeg_image, just like an object of any other type, such as integer. New data types usually need to have type-specific operations defined by the user who creates them. For example, one might define operations on an image data type such as compress, rotate, shrink, and crop. The combination of the data type and its associated methods is called an Abstract Data Type (ADT). The label "abstract" is applied to these data types because the database system does not need to know how an ADT's data is stored, nor how the ADT's methods work. It merely needs to know what methods are available and the input and output types for the methods. Hiding of ADT internals is called encapsulation. When the object is especially large, Object Identifications become significant. Storing copies of a large value in multiple constructed type objects may use much more space than storing the value once and referring to it elsewhere through reference type objects. This additional storage requirement can affect both disk usage and buffer management.
Large ADT objects complicate the layout of data on disk. This problem is well understood, and has been solved in essentially all ORDBMSs and OODBMSs. User-defined ADTs can be quite large. In particular, they can be bigger than a single disk page. Large ADTs, like blobs, require special storage, typically in a different location on disk from the tuples that contain them. Disk-based pointers are maintained from the tuples to the objects they contain.
The final issue in the prior art of ORDBMSs is efficiency. When complex objects are stored as blobs in a purely relational database management system, the entire object must be retrieved from memory and transferred to the client. Any and all processing of the object must be done by the client itself. However, in an ORDBMS, performance is improved because methods are executed by the server, not the client. As a trivial example, consider the query, "Find all books with more than 20 chapters." In a traditional relational DBMS, books might be represented as blobs and the client will have to retrieve each book and scan it to decide if it meets the criteria. In contrast, with proper OO support, the server can execute the "number of chapters" method and only those books will be transmitted to the client.
This is one aspect of performance in the sense that only the required data is retrieved and transmitted to the client. Another aspect of efficiency has to do with address spacing. If the storage system runs in a different address space from the user program, then an address space switch must occur to process this command. Because of the address space switch, the command will run two to three orders of magnitude slower than in the non-persistent case. Such a performance hit is unacceptable to users, which is why persistent storage systems are designed to execute in the same address space as the user program. Avoiding an address space change provides much higher performance. The advantage to using a persistent language is that in the persistent language world, updates are very "lightweight," that is, they take a very small amount of time. As a result, expressing updates in a low-level language such as C++ is fundamentally different than in a high-level notation such as SQL. In C++, or in any other third generation programming language, updates are fundamentally lightweight, that is, they modify a single storage location.
A final aspect of performance is the data transfer between the DBMS and the address space where the function will execute. In the prior art, data is transported from the DBMS which is stored in a hard disk or some other similar device, to the application. This type of data transfer places an enormous burden on network resources and causes unacceptable delays. Disk Input/Output (I/O) is source of major delays in processing speeds in the prior art. In a situation where there are large amounts of transactional data in a traditional relational DBMS, it is desirable to perform complex calculations on this data. The limiting factor on such data manipulation is speed and performance. In the prior art, large amounts of data are retrieved by the DBMS and provided as input to a function which executes on the data and returns an output value. However, I/O bottlenecks occur when the data needs to be transferred from disk to memory. For example, when large amounts of transactional data must be operated upon in order to provide real-time or close to real-time data analysis, the actual transactional data must be transferred from the DBMS and delivered to the client where the computational process is executed. This has traditionally resulted in significant delays. The present invention aims to correct this deficiency and allow for high speed, efficient processing of data retrieved from a database on a real-time basis.
The prior discussion can be applied in the context of Enterprise Resource Planning systems. These transactional systems are employed by companies to automate and manage its business process on a daily basis. These online transaction processing systems are designed to provide integrated processing of all business routines and transactions. They include enterprise-wide, integrated solutions, as well as specialized applications for individual, departmental functions. They mirror all of the business-critical processes of the enterprise--finance, manufacturing, sales, and human resources. The R/2 and R/3 Systems from SAP AG are on example of such a transactional system.
It is advantageous to be able to analyze the transactional data generated by such systems. In the prior art, companies have employed computers to analyze business process data and provide decision support. Traditionally, data from the transactional systems were batch uploaded to a data warehouse. Analysis was performed on the data from the data warehouse. The analysis was not being performed in a real-time basis. The present invention aims to correct that deficiency.
It is an object of the present invention to provide a system and method that can perform complex manipulations on large amounts of transactional data in real-time.
It is a further object of the present invention to eliminate the I/O bottlenecks which traditionally occur when large amounts of transactional data is transferred to the client for processing.
It is a further object of the present invention to provide a system and method enabling the storage of transactional data in optimized data structures providing suitable representations of complex data structures, like networks or trees, based on object references.
It is a further object of the present invention to keep the transactional data stored in optimized data structures correlated to the transactional data being updated on the transactional system.
It is a further object of the present invention to provide a system and method whereby complex objects can be stored in an object-oriented environment.
It is a further object of the present invention to provide a system whereby said complex objects can be queried using traditional relational database techniques and the SQL language.
It is a further object of the present invention to provide a system where the complex objects are subject to sophisticated transaction management systems as in a relational environment.