E-commerce is becoming more and more a part of everyday life. Purchase enquiries and purchase orders for goods and services are made over electronic networks, most usually in the public internet. The high volume e-commerce applications demand an infrastructure to offer high availability, guaranteed quality of service (QoS) and response time with load balancing, fault tolerance and stability for high availability. Such systems are deployed over a cluster where the cluster nodes host application server (and application) and database instances (master database instance and replicas) to share the workload and provide high availability and improved response time.
One known approach for implementing e-commerce applications is J2EE (Java 2 Platform, Enterprise Edition, published by Sun Microsystems, Inc). J2EE is a set of coordinated specifications and practices that together enable software solutions for developing, deploying, and managing multi-tier server-centric applications. J2EE is also a platform for building and using web services.
The primary technologies in the J2EE platform are: Java API for XML-Based RPC (JAX-RPC), JavaServer Pages, Java Servlets, Enterprise JavaBeans components, J2EE Connector Architecture, J2EE Management Model, J2EE Deployment API, Java Management Extensions (JMX), J2EE Authorization Contract for Containers, Java API for XML Registries (JAXR), Java Message Service (JMS), Java Naming and Directory Interface (JNDI), Java Transaction API (JTA), CORBA, and JDBC data access API.
A known e-commerce architecture has a tiered development and deployment approach for the application. The different tiers of an e-commerce application are (i) view or user interface tier, (ii) controller or application logic tier, and (iii) model or application's persistent data model tier. These tiers, known as MVC (i.e. model, view, and controller) architecture, are deployed over web, application and database servers respectively. As shown in FIG. 1, a MVC architecture 10 has a human actor 12 who interacts with a web service client computer 14. The client computer 14 runs a browser application (that is a client to a J2EE program that invokes the web service), and interacts application servers over a public network 16, such as the internet, using a suitable (i.e. http/https) protocol. An application server 18, deploying J2EE applications, has a servlet container 20 within which resides multiple application Java servlets 22. The container 20 implements J2EE servlet specifications and executes the servlets 22 in runtime. The output 24 of the servlet container 20 is RMI/IIOP (i.e. RMI over IIOP) invocation, passed to an Entity/Enterprise Java Bean (EJB) container 26. The EJB container 26 has multiple application EJBs 28. The output 30 from the EJB container 26 is a JDBC API, which makes read/write calls on a database 32.
One approach to deploy a multi-tiered architecture is to cluster web, application and database tier to improve the end-to-end application performance. As shown in FIG. 2, an architecture 50 includes the web service client 14, in communication with a network dispatcher program 52. A cluster of nodes 54-58 host multiple application servers 59-62 and database instances 64-68. The dispatcher program 52 distributes requests equally to the nodes 54-58. The database instances 64-68 are replicated across several nodes to get performance benefit and higher availability in case of database failures. The network dispatcher 52 (or Virtual IP) abstracts the client application 14 from the cluster and provide a single interface to interact with the cluster of nodes 54-58.
Turning then to the application servers 59-62. The Application Servlets 22 have the same function as described above. Each of the Application Logic 82 is set of Java classes that house the business logic that the application uses to fulfil client requests. The business logic could be anything; for example: validate the data sent by the client 12 to persist in the database 70. The Application Session Beans 84 are Enterprise Java Beans (EJB) as explained above. Session beans are Java components that house application logic requiring ‘ACID’ support. ACID stands for: Atomicity, Consistency, Isolation, and Durability. The J2EE container (such as the IBM WebSphere Application Server and the BEA Weblogic Application server) offers ACID support to the Session Beans 84.
The data access layers 72-76 are deployed to replace Entity Beans, and to access the database directly. A network dispatcher 78 is deployed with the same principles as explained above with reference to the dispatcher 52, to route database requests to one of the database nodes in the replica cluster 64-68.
Read operations are routed to the replica database instances 64-68 and the updates, inserts and deletes are routed to a master database 70 by the respective data access layer 72-76 and the network dispatcher 78. If the application demands a read following an immediate write, the data access layer 72-76 has to be either stateful between transactions to route such a query to the master or it provides stale data to the application by routing the query to the replica. The replication infrastructure works independently in the background and is not integrated with the data access layer to notify as and when it completes the replications jobs. This makes the data access layer 72-76 less smart, as it continues to forward all the queries following the insert/delete/update to the master 70 even if the data is being replicated to the replicas, and thereby under-utilizing the resources of the cluster.
Another approach—suited to applications that have a very large database—is to implement the database as a master and partition topology. As shown in FIG. 3, an architecture 100 once again has a network dispatcher 52. Each application server 102-106 has application servlets, application logic and application session bean(s) in common with the servers 58-62 of FIG. 2. However, an application entity bean(s) layer 108-112 replaces the data access layer 72-76. A primary database instance 114 exists and responds to read/write requests from the respective application entity bean(s) 108-112. Additionally, the primary database instance 114 exists as discrete partitions 116. The primary database instance 114 has knowledge of the partitioned database instances in the cluster and also maintains the information on how the data is partitioned and which node in the partition carry what part of the data. This information is used to build the index at the primary db. Once a query is submitted, the primary database 114:                i) analyzes the query,        ii) splits it in various parts to match the data partitions,        iii) routes the individual parts to the partitioned database nodes 116n,        iv) gathers results from each of the partitions involved in the query execution,        v) perform database operation(s) on the result collection that can not be performed by the underneath partitions individually as the operation requires a complete view of the results from all the partitions,        vi) compose the final result set, and        vii) answers the query to the client        
The partitioned databases 116n are database instances that carry the part of the database 114. For example, a large table T could be partitioned in two database instances such that the first database carries first half of the rows (tuples) of that table and the second database carries the second half. A database partitioning can also be achieved by placing different tables at different database servers. For example, Table T1 is placed at server S1 and T2 at server S2.
However, there are following limitations in deploying distributed systems over such solutions:    1. The deployment of the data partitions is very specific to the database vendor and product. The data partition deployment and query routing logic is not an industry standard and that makes the application tightly coupled with the database product and vendor.    2. The database products providing data partitioning may need extra database administration as the solution is an extension to the standard database technology.    3. The single database instance acting as the primary interface to the partitioned datasets abstracts partitioned database instances; however, it acts as an intermediate query stop point before the query is routed to the partitioned node carrying the data relevant to the query. This makes the application first connect to the primary database instance and then the primary database instance connects to the secondary instance making the system less efficient in certain situations as discussed later in the section.    4. There exist smart techniques to deploy the primary instance and the partition instances to offer fault tolerance. However if the solution is designed and deployed to have a single primary instance as single point of interface to the database system, the risk of database failure increases due to single point of failure of the primary instance.
The primary instance analyzes the query to check which data partition the query can be contained and if there is a single data partition, the primary instance routes full query to the partition. If there are multiple partitions involved to contain the query, the primary instance splits the query in parts that can be routed to individual partitions and, if required, takes the responsibility to process the results from each partitions (such as a join operation) before sending the result back to the application.
If the query workload and the data are well analyzed to partition the data, there shall be fewer instances where the query spans multiple data partitions. In OLTP applications, the queries are less complex and in most of the cases they are answered by the single partition. It will be therefore more efficient for such applications to be able to route the query directly to the partition compared to routing the query to the primary instance and then getting it routed to the partition. However, the enterprise system enabling such direct routing should also support other features of the primary database instance such as splitting the queries for different partitions and joining their results back in a way that is transparent to the application and can be adopted as an industry standard to enable enterprise system vendors to incorporate the solution in the framework. The lack of support to the above makes J2EE applications tightly coupled with the database vendor or has to encapsulate data partition logic within the application tier, both making application portability complex. This drives a need for Enterprise Systems, such as J2EE frameworks, to enable application deployment over partitioned databases in a transparent and loosely coupled way.
The invention is directed to overcoming or at least reducing one or more of these problems.