This application includes a transmittal under 37 C.F.R. xc2xa71.52(e) of a Computer Program Listing Appendix comprising duplicate compact discs (2), respectively labeled xe2x80x9cCopy 1xe2x80x9d and xe2x80x9cCopy 2xe2x80x9d. The discs are IBM-PC machine formatted and Microsoft(copyright) Windows Operating System compatible, and include identical copies of the following list of files:
All of the material disclosed in the Computer Program Listing Appendix is hereby incorporated by reference into the present application.
A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
1. Field of the Invention
The present invention relates generally to distributed (e.g., three-tier) computing systems and, more particularly, to a system and methods for improving data access and operation in distributed computer environments.
2. Description of the Background Art
Today, most computers are linked to other computer systems via a computer network. Well-known examples of computer networks include local-area networks (LANs) where the computers are geographically close together (e.g., in the same building), and wide-area networks (WANs) where the computers are farther apart and are connected by telephone lines or radio waves.
Often, networks are configured as xe2x80x9cclient/serverxe2x80x9d networks, such that each computer on the network is either a xe2x80x9cclientxe2x80x9d or a xe2x80x9cserver.xe2x80x9d Servers are powerful computers or processes dedicated to managing shared resources, such as storage (i.e., disk drives), printers, modems, or the like. Servers are often dedicated, meaning that they perform no other tasks besides their server tasks. For instance, a database server is a computer system that manages database information, including processing database queries from various clients. The client part of this client/server architecture typically comprises PCs or workstations that rely on a server to perform some operations. Typically, a client runs a xe2x80x9cclient applicationxe2x80x9d that relies on a server to perform some operations, such as returning particular database information. Often, client/server architecture is thought of as a xe2x80x9ctwo-tier architecture,xe2x80x9d one in which the user interface runs on the client or xe2x80x9cfront endxe2x80x9d and the database is stored on the server or xe2x80x9cback end.xe2x80x9d The actual business rules or application logic driving operation of the application can run on either the client or the server (or even be partitioned between the two). In a typical deployment of such a system, a client application, such as one created by an information service (IS) shop, resides on all of the client or end-user machines. Such client applications interact with host database engines (e.g., Sybase(copyright) Adaptive Server(copyright)), executing business logic that traditionally ran at the client machines.
More recently, the development model has shifted from standard client/server or two-tier development to a three-tier, component-based development model. This newer client/server architecture introduces three well-defined and separate processes, each typically running on a different platform. A xe2x80x9cfirst tierxe2x80x9d provides the user interface, which runs on the user""s computer (i.e., the client). Next, a xe2x80x9csecond tierxe2x80x9d provides the functional modules that actually process data. This middle tier typically runs on a server, often called an xe2x80x9capplication server.xe2x80x9d A xe2x80x9cthird tierxe2x80x9d furnishes a database management system (DBMS) that stores the data required by the middle tier. This tier may run on a second server called the database server.
The three-tier design has many advantages over traditional two-tier or single-tier designs. For example, the added modularity makes it easier to modify or replace one tier without affecting the other tiers. Separating the application functions from the database functions makes it easier to implement load balancing. Thus, by partitioning applications cleanly into presentation, application logic, and data sections, the result will be enhanced scalability, reusability, security, and manageability.
Three-tier database systems are well documented in the patent and trade literature; see, e.g., U.S. Pat. No. 6,266,666, entitled xe2x80x9cComponent transaction server for developing and deploying transaction-intensive business applications,xe2x80x9d the disclosure of which is hereby incorporated by reference.
In the three-tier model, communication must occur among the various tiers, such as from a client to a middle tier, and from the middle tier to a back-end database. A multitude of message traffic or communication flows between the client and the database, with the middle tier positioned in between. One of the advantages of employing a middle tier is to pool together connections to the database in a central (middleware) tier, thus allowing more efficient access to the database. In particular, database connections, which are expensive in terms of system and network resources, are cached in the middle tier.
Another advantage of the middle tier is to offload certain computations from the back-end database, particularly those pertaining to business logic (i.e., business objects). Exploiting this advantage, a system administrator would deploy a middle tier on a separate server computer, one that was physically separate from the computer hosting the back-end database. More recently, however, hardware vendors have released more powerful computers such that both the middle tier and the back-end database may now easily run on the same host, a single physical computer. One such computer is Sun""s StarFire computer (Sun Microsystems of Mountain View, Calif.); it employs 64 processors, running under a 64-bit operating system, with access to a 64G memory space. As a result of this more powerful hardware architecture now available, the approach of deploying a middle tier on a separate physical computer is no longer a necessity. In some instances, it may be more cost effective to deploy and maintain the middle tier and the back-end database on the same computer.
Typically, any business logic modeled on a middle tier requires significant, if not substantial, access to the back-end database. For example, SQL queries may be passed from the middle tier to the database, with corresponding result sets being returned back to the middle tier (and then onto the relevant client). If a particular query result is large, a corresponding large data set (and accompanying messages) must be transmitted back to the middle tier. Therefore, in a classic configuration, where a middle tier exists on a separate machine, a lot of network communication occurs between the middle tier and the database. In the instance where the middle tier and database reside on a single computer, physical (e.g., Ethernet) network traffic is avoided. However, the communication process is still resource intensive, as the underlying communication protocol stack (e.g., TCP/IP) is still used to effect communication between the middle tier and the database. Accordingly, system performance is negatively impacted.
Another disadvantage that comes to light is of the potential for breach of security. Even when the middle tier and database are on the same physical machine, it is still possible for an unauthorized individual to gain access to the communications occurring between the two. Again, this results from the underlying communication protocol stack employed to effect the communication. Although the communications may be encrypted (e.g., using SSL, Secured Socket Layer), such encryption adds additional overhead to the system, thus impacting overall system performance.
To date, attempts to address the foregoing problems have focused on optimizing network communication. For example, using a xe2x80x9cloop backxe2x80x9d optimization, communication between two processes (e.g., a middle tier and a database) may be improved if both are residing on one host. Here, the host is specified to be a local host. As a result, certain driver-level optimizations may occur at the level of the underlying TCP/IP driver (used to effect communication). However, that approach has the distinct disadvantage of affecting the visibility of the host (across the entire network). Moreover, the approach still relies on network communication occurring between the middle tier and the database, even though both processes may reside on one physical machine. Accordingly, a better solution is sought.
The following definitions are offered for purposes of illustration, not limitation, in order to assist with understanding the discussion that follows.
Application server (also appserver): A program that handles all application operations between users and an organization""s back-end business applications or databases. Application servers are typically used for complex transaction-based applications. To support high-end needs, an application server has to have built-in redundancy, monitors for high-availability, high-performance distributed application services, and support for complex database access.
Driver: A program that performs a set of specialized tasks, such as serving as a translator between different programs and/or devices.
Enterprise JavaBeans (also, EJB and Enterprise Java Beans): EJB is a widely-adopted server-side component architecture for the Java 2 Platform, Enterprise Edition (J2EETM), that enables rapid development of mission-critical applications that are versatile, reusable, and portable across middleware while protecting IT investment and preventing vendor lock-in. EJB is a specification that defines an EJB component architecture and the interfaces between the EJB technology-enabled server and the component. For further description, see, e.g., Enterprise JavaBean Specification, Version 2.0, available from Sun Microsystems.
Interprocess communication (IPC): A capability supported by many operating systems that allows one process to communicate with another process. The processes can be running on the same computer or on different computers connected through a network. IPC enables one application to control another application, and for several applications to share the same data without interfering with one another. Examples of IPC in the Microsoft Windows environment include Dynamic Data Exchange (DDE) and Windows Clipboard.
Java: A general purpose programming language developed by Sun Microsystems. Java is an object-oriented language similar to C++, but simplified to eliminate language features that cause common programming errors. Java source code files (files with a java extension) are compiled into a format called bytecode (files with a class extension), which can then be executed by a Java interpreter. Compiled Java code can run on most computers because Java interpreters and runtime environments, known as Java Virtual Machines (JVMs), exist for most operating systems, including UNIX, the Macintosh OS, and Windows. Bytecode can also be converted directly into machine language instructions by a just-in-time (JIT) compiler.
Java Beans (also JavaBeans): A specification developed by Sun Microsystems that defines how Java objects interact. An object that conforms to this specification is called a Java Bean, and is similar to an ActiveX control. It can be used by any application that understands the Java Beans format. The principal difference between ActiveX controls and Java Beans are that ActiveX controls can be developed in any programming language but executed only on a Windows platform, whereas Java Beans can be developed only in Java, but can run on any platform.
Java Native Interface (JNI): A Java programming interface that allows developers to access the languages of a host system and determine the way Java integrates with native code. JNI allows Java code that runs within a Java Virtual Machine (JVM) to operate with applications and libraries written in other languages, such as C, C++, and assembly. Programmers use the JNI to write native methods to handle those situations when an application cannot be written entirely in the Java programming language, especially for low-level operating system calls. For further description, see, e.g., Java Native Interface Specification, available from Sun Microsystems of Mountain View, Calif. Additional description may be found in the patent literature; see, e.g., U.S. Pat. No. 6,066,181.
Process: An executing program; sometimes used interchangeably with xe2x80x9ctask.xe2x80x9d
Semaphore: A hardware or software flag. In multitasking systems, a semaphore is a variable with a value that indicates the status of a common resource. It is used to lock the resource that is being used. A process needing the resource checks the semaphore to determine the resource""s status and then decides how to proceed.
Thread: A part of a program that can execute independently of other parts. Operating systems that support multithreading enable programmers to design programs whose threaded parts can execute concurrently.
In accordance with the present invention, a multi-tier database system is modified such that a middle-tier application server (EJB server) and a database server run on the same host computer and communicate via shared-memory interprocess communication. The system includes a JDBC (driver) thread that attaches to the database server, specifically by attaching to the database server""s shared memory segment. Operation of the JDBC driver is modified in accordance with the present invention to provide direct access between the middle tier (i.e., EJB server) and the database server, when the two are operating on the same host computer.
The present invention introduces the notion of xe2x80x9cexternal engines.xe2x80x9d The database server itself has an engine. Each engine, in turn, is a separate process (i.e., executing process). A xe2x80x9ctaskxe2x80x9d or xe2x80x9cprocessxe2x80x9d is a program in execution together with virtual memory containing instructions, data, and context information, such as descriptors for open files and communication channels. In accordance with the present invention, a multitude of engines are instantiated. Each engine communicates with other engines using the interprocess communication (IPC) technique of shared memory. In this manner, Enterprise Java Beans (EJB) support may itself be implemented as an external engine. More particularly, in the currently preferred embodiment, EJB services may be provided by an existing application server, the Sybase Enterprise Application Server (hereinafter, Application Server), which classically operates in the middle tier. In accordance with the present invention, the Application Server is modified to operate as an external engine. As part of this modification, the Application Server communicates with the back-end database server (e.g., Sybase Adaptive Server Enterprise, ASE) using shared-memory IPC. Further, a xe2x80x9cfast pathxe2x80x9d high-speed shared memory driver is provided. The fast path memory driver attaches to a shared memory segment of the back-end database (e.g., Sybase ASE) and thereafter provides rapid data exchange/sharing between separate processes (e.g., separate UNIX or Windows NT processes), using shared-memory IPC.
One or more engines attach to the shared memory segment of the database server. One of those engines is the Application Server (EJB) engine, the process associated with the EJB server. In operation, during procurement of the first database connection, the JDBC driver or thread attaches by attaching to the key for the shared memory segment, which itself is an operating system handle (e.g., integer or logical name) that uniquely identifies the shared memory segment. All of the database server""s in-memory data structures that are visible in shared memory, including for example locks, buffers, and the like, are visible to the attached engines. These data structures include send and receive buffers, which are employed for facilitating communication between the various clients/EJB server and the database server. As previously described, by virtue of the fact that the JDBC thread is attached to the shared memory segment, the thread""s processxe2x80x94that is, the EJB engine (process)xe2x80x94also attaches to the shared memory segment. Even when the initial JDBC thread terminates, the process may remain attached. As a result, all subsequent threads (of the EJB engine) can automatically benefit from this existing attachment.
Operation is summarized as follows. The first client that is making a connection to the database (e.g., Java program executing at the Application Server) attaches to the back-end database shared memory. Here, a thread that is executing in the context of the Application Server (process) attaches to the back-end database shared memory. By virtue of the fact that a thread (e.g., Java program) has attached, the underlying process (i.e., Application Server process) also attaches to be shared memory. The attachment, which entails attachment to the underlying shared-memory xe2x80x9ckeyxe2x80x9d (i.e., an operating system construct), is done once. Upon attachment, all of the back-end database in-memory (public) data structures are visible to the Application Server, and thus may be accessed directly (using shared-memory IPC technique). All subsequent clients requiring connectivity to the database (i.e., all subsequent Java program threads running in the Application Server) make use of the fact that a process (i.e., the Application Server itself) has already attached to the database""s memory space. These subsequent threads do not need to make individual attachments but, instead, take advantage of the existing attachment. Accordingly, all communications of the subsequent threads may occur over high-speed shared-memory IPC.