The client/server model which has emerged in the late 1980s is a versatile and modular software architecture that was devised to improve usability, flexibility, interoperability, and scalability as compared to centralized, mainframe, time sharing computing that was the norm at that time. The client/server architecture has since progressively completely replaced the previous mainframe software architectures where all intelligence was within the central host computer and where users interacted with the host through dumb terminals. If mainframes are still however in use it is only as powerful servers in various client/server architectures where dumb terminals have also been replaced by intelligent graphical user interfaces (GUI) capable of self processing the received and transmitted data from/to servers.
In modern data processing systems, a client/server architecture largely in use and capable of supporting a large number of remotely located clients is the so-called 3-tier architecture. An example of such architecture is illustrated in FIG. 1. The master tier 100 is traditionally built around a database system 120, possibly a large or very large repository of all the data necessary to the daily operation of any business organization, company or enterprise in order to conduct all sorts of commercial and administrative operations. Database is mostly of the relational type, i.e., is under the control of a relational database management system or RDBMS. It is typically administrated through one or more master servers 112 by administrators of the data processing system from GUI's 140. Administrators are generally the sole users of the system authorized to update directly database contents.
The intermediate or middle tier of the exemplary 3-tier system of FIG. 1 is the application tier 200 from where all the specific software applications 240 of the organization, owner of the data processing system, are run. This collection of specific applications, often globally referred to as the middleware software, is the proprietary software of the organization. It is used to serve all organization's remote clients from its repository of data 120 through the master servers 110. Remote clients form the third tier 300 of the 3-tier architecture. Queries from client tier 300 are thus processed and responded by the specific applications of the intermediate tier 200 on data fetched from the master tier 100.
In a 3-tier architecture, when a larger number of remote clients need to be served, scalability of the system to maintain global performances is obtained by adding independent processing nodes in the middle tier so as to increase the overall processing power of the data processing system. Hence, the application tier 200 is generally comprised of several independent processing nodes that are referred to, in the following description, as slave nodes 210. Then, a common practice to prevent master tier 100 from being overwhelmed by too many data requests from an increasing number of slave nodes, is to have the applicative processes 240 working on pieces of data brought from the master database and stored in each application node as long as necessary. In the exemplary system of FIG. 1 this takes the form of cache files 250 on which the applicative processes 240 can work without having to incur long delays to get them from the master database through the master servers each time they are needed. In such a data processing system processing power and software applications are thus distributed, i.e., partitioned, on as many nodes 210 as necessary to reach the level of processing power necessary to serve all remote clients 300 of the system. So are the distributed cache files 250.
In such a distributed computing environment it has been however proved that some desirable properties of a distributed data system cannot be all guaranteed simultaneously. As illustrated in FIG. 2 these expected properties 40 of a distributed data processing system are: consistency, availability and scalability. A theorem known as the CAP theorem, states that a distributed system can satisfy any two of these properties at the same time but not all three. CAP, which stands for: consistency, availability and partition tolerance; has been first conjectured in 2000 by E. Brewer, Professor at the University of California, Berkeley, the USA. A demonstration of the theorem has been later made in a paper authored by N. Lynch and S. Gilbert, published in 2002 in ACM SIGACT News, v.33 issue 2, pages 51-59. CAP partition tolerance property is tightly link to scalability since, as discussed above, overall power processing of the system is actually obtained in distributing, i.e., partitioning it over independent processing nodes.
Consistency and availability 41 can be fully met in 3-tier architectures only if the data used by the middle tier applications always come from the master database. This can be obtained as the expense of generating a very high down traffic from master tier 100 to application tier 200 just to answer queries from the client tier 300 also resulting in a very high occupancy of the master database to answer them. This comes in conflict with the administration and updating of the master database by administrative users (140) even though the proportion of writes into the database is generally relatively low. Access to the database and traffic on the network between data and application tiers are clearly bottlenecks that limit performances when the number of users of the client tier increases.
Availability and scalability 42 are achievable in a 3-tier architecture like the exemplary one shown in FIG. 1 by having distributed cache files 250 in order to overcome the above problems of database occupancy and high traffic between data and application tiers. However, in this case, there is no guarantee that cache file contents are consistent between slave nodes and with contents of the master database since they are distributed over independent computing nodes.
It is thus an object of the invention to bring a solution to this problem. In a 3-tier client/server architecture where client applications and replicated files are distributed over a plurality of independent slave nodes, the invention discloses a method and a system to maintain strong consistency between replicated file contents and full availability while preserving some scalability 43.
Further objects, features and advantages of the present invention will become apparent to the ones skilled in the art upon examination of the following description in reference to the accompanying drawings. It is intended that any additional advantages be incorporated herein.