A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention relates generally to data access and processing in a distributed computing system and, more particularly, to a system implementing methodology for improving data streaming of objects in distributed computer environments.
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as xe2x80x9crecordsxe2x80x9d having xe2x80x9cfieldsxe2x80x9d of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of a database management system is known in the art. See e.g., Date, C., An Introduction to Database Systems, Volume I and II, Addison Wesley, 1990; the disclosure of which is hereby incorporated by reference.
DBMS systems have long since moved from a centralized mainframe environment to a de-centralized or distributed environment. One or more PC xe2x80x9cclientxe2x80x9d systems, for instance, may be connected via a network to one or more server-based database systems (SQL database server). Well-known examples of computer networks include local-area networks (LANs) where the computers are geographically close together (e.g., in the same building), and wide-area networks (WANs) where the computers are farther apart and are connected by telephone lines or radio waves.
Often, networks are configured as xe2x80x9cclient/serverxe2x80x9d networks, such that each computer on the network is either a xe2x80x9cclientxe2x80x9d or a xe2x80x9cserver.xe2x80x9d Servers are powerful computers or processes dedicated to managing shared resources, such as storage (i.e., disk drives), printers, modems, or the like. Servers are often dedicated, meaning that they perform no other tasks besides their server tasks. For instance, a database server is a computer system that manages database information, including processing database queries from various clients. The client part of this client-server architecture typically comprises PCs or workstations which rely on a server to perform some operations. Typically, a client runs a xe2x80x9cclient applicationxe2x80x9d that relies on a server to perform some operations, such as returning particular database information. Often, client-server architecture is thought of as a xe2x80x9ctwo-tier architecture,xe2x80x9d one in which the user interface runs on the client or xe2x80x9cfront endxe2x80x9d and the database is stored on the server or xe2x80x9cback endxe2x80x9d The actual business rules or application logic driving operation of the application can run on either the client or the server (or even be partitioned between the two). In a typical deployment of such a system, a client application, such as one created by an information service (IS) shop, resides on all of the client or end-user machines. Such client applications interact with host database engines (e.g., Sybase(copyright) Adaptive Server(trademark)), executing business logic which traditionally ran at the client machines.
More recently, the development model has shifted from standard client/server or two-tier development to a three-tier (or n-tier), component-based development model. This newer client/server architecture introduces three well-defined and separate processes, each typically running on a different platform. A xe2x80x9cfirst tierxe2x80x9d provides the user interface, which runs on the user""s computer (i.e., the client). Next, a xe2x80x9csecond tierxe2x80x9d provides the functional modules that actually process data. This middle tier typically runs on a server, often called an xe2x80x9capplication server.xe2x80x9d A xe2x80x9cthird tierxe2x80x9d furnishes a database management system (DBMS) that stores the data required by the middle tier. This tier may run on a second server called the database server.
The three-tier design has many advantages over traditional two-tier or single-tier designs. For example, the added modularity makes it easier to modify or replace one tier without affecting the other tiers. Separating the application functions from the database functions makes it easier to implement load balancing. Thus, by partitioning applications cleanly into presentation, application logic, and data sections, the result will be enhanced scalability, reusability, security, and manageability.
In a typical client/server environment, the client knows about the database directly and can submit a database query for retrieving a result set which is generally returned as a tabular data set. In a three-tier environment, particularly a component-based one, the client never communicates directly with the database. Instead, the client typically communicates through one or more components. Components themselves are defined using one or more interfaces, where each interface is a collection of methods. In general, components return information via output parameters. In the conventional, standard client/server development model, in contrast, information is often returned from databases in the form of tabular result sets, via a database interface such as Open Database Connectivity (i.e., ODBC, available from Microsoft Corp. of Redmond, Washington) or Java Database Connectivity (i.e., JDBC, available from Sun Microsystems of Mountain View, Calif.). A typical three-tier environment would, for example, include a middle tier comprising business objects implementing business rules (logic) for a particular organization. The business objects, not the client, communicates with the database.
For their part, application writers or developers like to write object-oriented programs using modern object-oriented programming techniques. At the same time, however, these developers prefer to have their data (i.e., the data employed by the application) stored in a database having relational tables, as that is an easy way of storing and retrieving data. A particular problem arises when one wants to retrieve data from the database for use (e.g., manipulation) within one""s program: how is this xe2x80x9cflatxe2x80x9d data converted into objects. In this regard, xe2x80x9cobjectxe2x80x9d refers to the specific programming construct that defines associated data members and methods (typically, including data hiding and containment), such as an object instantiated from a C++ class, a Java class, an Object Pascal class, or the like.
One of the advantages of Java as an object-oriented language over C++ is in Java""s ability to flatten objects into a standard binary representation. This ability to flatten objects allows the persisting of objects in files or databases, or transmission of objects between applications across a network. Because the representation is standard, applications written by different vendors can exchange objects without having to revert to a proprietary protocol. This standard representation was developed by Sun Microsystems and will be referred to herein as Sun serialization.
Sun serialization is a protocol for converting between a Java object and its binary representation. The binary representation is an array of bytes coded to represent the Java object using the Sun serialization protocol. How the Java object is represented within its particular host virtual runtime environment (virtual machine or VM) is irrelevant to its binary encoding. A Java object itself is a collection of data fields whose values are interpreted by a Java class. The Java class of an object may specify one or more named typed fields, whose values are contained in the object. Java classes can be xe2x80x9csubclassed,xe2x80x9d meaning other classes can inherit the named typed fields of a particular class, and provide additional named typed fields.
When an object is serialized using Sun serialization, a description of the object""s class is serialized along with it. The class description is the template that allows the object to be reconstructed. Such a template allows a meaningful interpretation of the object""s data, without which the data would just be a stream of bytes. The class description includes details of the class field names and types. With this information, another goal is achieved: the description acts as versioning information. Classes can be modified over time as the development process dictates, and an object serialized under an earlier version of a particular class must be able to xe2x80x9cdeserializedxe2x80x9d as a newer version of the class. This would, in general, be impossible without the serialization""s inclusion of class field names and types. When deserialization takes place the old class description can be compared to the newer description and fields can be mapped as needed.
The inclusion of the detailed class description in the object serialization makes those serializations portable and xe2x80x9cversionable.xe2x80x9d Unfortunately, however, this is done today at the expense of sometimes considerable size required to represent the descriptions. A time penalty also results, from the time taken to write the description. Accordingly, a better solution is sought.
What is desired is a solution providing the ability to create and stream objects, particularly Java objects, in a manner which does not incur a considerable size or resource penalty. Moreover, such a solution should preserve portability. The present invention fulfills this and other needs.
A distributed (e.g., client/server) computing environment is described which, in accordance with the present invention, simplifies the use of objects in distributed applications or other instances where transfer of objects is required. In particular, the invention provides an improved methodology for streaming objects (e.g., Java objects) stored and managed remotely (e.g., objects stored and managed in relational databases) to clients in a highly efficient manner. Once at the clients, the objects may be executed or otherwise manipulated locally as desired.
The present invention may be implemented by extending an existing streaming methodology or protocol, such as Sybase Tabular Data Stream (TDS) protocol or other comparable streaming protocol. Streaming is modified to include a class identifier approach of the present invention for supporting object serialization. A Class ID (referred hereafter as ACI) serialization is provided as a protocol for converting between a java object and a binary representation. Like Sun serialization, it operates to provide object serialization. Unlike Sun serialization, however, the class description required in ACI is dramatically less, thereby minimizing the time penalty and storage requirements usually required to represent class description information in a stream.
ACI is intended for an environment in which all classes ever involved in any serialization are known by the environment (as is often the case). Each class known to the environment is represented by a compact numeric identifier, and it is this identifier alone that is used to represent the class description in the serialization. A table of the class identifiers is kept at the beginning of each serialization. ACI is much smaller but, without further enhancement, the approach would be at the expense of portability. In accordance with the present invention, however, a simple transformation is applied so that any ACI serialization can be converted to a portable serialization.
Class Descriptor serialization (ACD) is also provided for achieving portability. The ACD is identical to ACI except that the class identifier table beginning ACI is replaced by a table of class descriptors. These class descriptors contain virtually the same information as Sun class descriptors, so an ACD serialization has the same portability characteristics as Sun serialization. To convert between ACI and ACD serializations is a very simple and computationally frugal process. Because both are otherwise identical (apart from the class identifier tables), only the class table contents need change. The environment maintains a correspondence between the ACI class identifiers and ACD class descriptors. In this manner, the present invention provides the ability to create and stream objects, particularly Java objects, in a manner which does not incur a substantial size or resource penalty.