The present invention relates to distributed databases, and more particularly to an independent distributed relational database system operating over a local area network (LAN) or a wide area network (WAN).
Databases comprise one of the most widely used applications found in computing today. A database is a collection of related information about a subject organized in a useful manner that provides a base for procedures such as retrieving information, drawing conclusions and making decisions. A distributed database is a variation in which information is distributed or spread over a number of sites which are connected through a communication network.
A key problem in current database design is providing equal database access to all users whether they are local or remote. For example, to provide equal access to sales agents with their portable computers, to executives working from home, to work groups at satellite offices, to business partners, and to suppliers, presents a challenge to existing database design. Advantageously, each user should be able to use and change selected information from their computer, with the same performance and functionality that they would enjoy at a workstation located at head office with the server.
While the prior art includes numerous database management systems, none of the existing systems provide xe2x80x9ccompletely equal accessxe2x80x9d. Known systems which allow off-site users to work with in-office information systems require the remote users to access an office LAN or central database server through expensive, slow, and often insecure dial-up lines, WAN links, and remote-access products. The major problems associated with the prior art approaches can be classified under Performance, Scalability, Reliability, Availability, Autonomy, and Security.
Performance. The remote user experiences inferior performance because the user is forced to access data at a remote location using slow modem, or WAN, connections. Furthermore, the actual data is retransmitted every time it is accessed, thereby requiring fast and/or expensive connections in order to achieve acceptable performance.
Scalability. The central server must be able to support all local and remote users. As users are added, the central server eventually becomes the bottleneck. Known systems are typically limited to about 1000 concurrent users.
Reliability. The central server must be regularly backed up. If a problem occurs, work done since the last backup is lost and all the dependent users must re-enter their recent work. Connection faults are also a common problem for remote users of known systems. When there is a connection fault, the user is interrupted until the connection is re-established.
From the foregoing, it will be appreciated that reliance on a central site or service is undesirable because that site could become a bottleneck as well as a point of failure.
Availability. In known systems, all remote users depend on a central server. If the central server is down, then all users are down and cannot work with the database until the server recovers. Windows of acceptable down time are measured in seconds or minutes and servers are typically required to deliver better than 98% availability during working hours.
Autonomy. The remote users are partly or fully dependent on the server. Remote users will not always have an on-line data connection, e.g. modem or WAN, but, a remote user can only work with the database when on-line and connected. For example, sales agents on the road, or executives in airplanes, cannot use the database without very slow, expensive and unreliable cellular or satellite datalinks to keep them on-line to the central server.
Security. Remote access links in conventional systems are often not encrypted. Even when the links are protected, the organization loses control of any data sent to the remote machine. The remote user, who may not be an employee, but a supplier or customer, can use the information however they want.
Currently there are two main approaches to sharing relational databases: traditional distributed databases, and traditional replication systems.
Traditional distributed databases distribute the servers only, so that each site stores some subset (none, all, or some) of the rows and columns in each table. Clients access the distributed database by connecting normally to their local server. When a client makes a request which involves data stored at other servers, the network handles the request appropriately and returns the expected result so that to the client it appears as though the request was handled locally. Multi-site integrity is controlled through complex two-phase commit and equivalent protocols.
The main problem with this approach is that a database transaction must be performed on-line and involve every server which stores information involved in the transaction. This is costly in performance, since the speed of network connections between distributed servers affects the speed of the client""s transaction. It is also fragile, since all involved servers must be available for the transaction to succeed. If any server(s) are not available, then the client receives either an incomplete result or the client""s transaction fails completely which results are both undesirable.
With respect to replication, traditional replication systems replicate data primarily between servers but may also be used to replicate data to client machines. The main problems with known replication systems are that they are not peer systems, are complex to administer and maintain, and have integrity problems because of replication granularity. They are not peer systems because they distinguish between xe2x80x9cmasterxe2x80x9d and xe2x80x9creplicaxe2x80x9d databases and cannot support fully equal operation at all sites, local and remote. They are complex to administer and maintain because distribution rules are typically configured using row and column selection for every table at every site. For example, a simple change operation (e.g. xe2x80x9cSite 3 now needs Customer #531""s informationxe2x80x9d) typically requires extensive and error-prone changes to row and column selections in multiple database tables for that site.
Further, existing replication systems have integrity problems because they typically use record-level or field-level replication granularity. Consider a Customer table with fields Address, City, State, ZipCode, PaymentTerms, and CreditRating. With record-level granularity, the fields of an entire record are replicated together, which gives false collisions that are tedious for administrators to review; for example, changing a customer""s Address at one site and the CreditRating at another will result in a collision even though the two fields are unrelated (i.e., the person changing the Address knows that the customer moved, while the person changing the CreditRating knows that the customer failed to pay promptly). With field-level granularity, each field in a record is replicated separately, which solves the false collisions of record-level replication but causes integrity problems by not reporting collisions when two related fields are changed at different sites; for example, changing a customer""s Address at one site and the ZipCode at another should result in a collision (i.e., the person changing the Address because he knows that the customer has moved must also know enough to change the ZipCode).
The IDDB database according to the present invention overcomes the disadvantages associated with the prior art. The present invention provides a database architecture in which all users at all sites work off-line with local data, that is, all application transactions are against the local database only, and every site locally stores xe2x80x9call and onlyxe2x80x9d the data it needs. This means that application transactions are not network-dependent and therefore do not suffer speed or availability problems when the network or remote sites are down or loaded. The on-line transactions only occur in the background, including a periodic synchronization between sites that transmit any changes to data that is of interest to the site. If the background operations are interrupted or the network is temporarily unavailable, the user does not see new changes made at other sites until the datalink is again established, but otherwise the user remains unaffected. According to the present invention, no site acts as a xe2x80x9cserverxe2x80x9d for any other site, however, some sites may store more data or have more users than others, but all sites are logically peers.
Another problem with prior art systems concerns propagation mechanisms. While the prior art includes numerous propagation mechanisms, the existing systems suffer a number of drawbacks.
Firstly, known transaction-log-based systems are limited to databases which maintain and publish transaction logs. Some databases do not allow access to their transaction logs, and some do not have transaction logs at all.
Secondly, the known systems are generally limited to hub-and-spoke topologies and/or unidirectional update transmission. It will be appreciate that such systems are not scalable. As such these systems are not capable of supporting peer-to-peer and other non-hub-and-spoke topologies, nor can they support update-anywhere operation where updates may occur at any site.
Thirdly, the known systems are typically limited to capturing the transactions originally used by the application making the updates. This technique traditionally either requires inserting special middleware between the application and the database to capture the transactions as they occur, which would require changes to all applications manipulating the database including interactive queries and updates, or else requires reading the transaction log as above. As a result, this technique is not useful if the application(s) do not use transactions.
Fourthly, known systems do not have the capability to add transactions when updates made at a site whose database does not support transactions are propagated to a site whose database does support transactions. It will be appreciated that this becomes a safety issue when data is moved from a low-integrity database into a high-integrity database.
Accordingly, there also remains a need for a transaction update mechanism which is not limited to any particular relational database product and which does not require transaction logs. Such a transaction update mechanism should not be limited any topology and should support update-anywhere processing. Such a transaction update mechanism should also be capable of propagating data within transactions even if the application did not utilize transactions. Furthermore, such a transaction mechanism should be capable of propagating data within transactions even if the database where the update originally occurred does not support transactions.
The present invention provides an architecture for an independent distributed database or IDDB. In the IDDB, all sites, i.e. nodes, are peers and no site acts as a server for another. This means that unlike conventional database replication systems, the distributed database according to the present invention does not distinguish between xe2x80x9cmasterxe2x80x9d and xe2x80x9cslavexe2x80x9d sites, or xe2x80x9cprimaryxe2x80x9d and xe2x80x9csecondaryxe2x80x9d sites, or xe2x80x9cservicexe2x80x9d and xe2x80x9creplicaxe2x80x9d sites. With the IDDB, any subset of sites continue to operate normally without the need for a master site.
Each site stores xe2x80x9call and onlyxe2x80x9d the data it needs. It is a feature of the present invention that users work off-line with local data, and all application transactions are against the local database. Sites sharing the same data synchronize their changes periodically in the background and changes made at one site become visible to all the other interested sites. It is a feature of the IDDB that there are no on-line or distributed application transactions because all application transactions are local. There are network transactions for performing replication and housekeeping functions, but they operate in the background and are not visible to the application, or the user.
In respect of the shortcomings associated with the prior art architectures as described above, the database architecture according to the present invention provides a significant improvement in these areas.
Performance. According to the invention, all users utilize local databases to which they have high-speed (i.e. network or same machine) access. There is no dependency on remote datalinks for any part of normal operation. According to another aspect, the background sync transactions are faster because only changed data is transmitted, and then only once to each affected site. This feature greatly reduces the bandwidth requirements and thereby allows the use of slow (and inexpensive) modem links for most business applications.
Scalability. According to the present invention, there is no central server requirement. Thus, no site acts as a server for any other site and as a result no site becomes a bottleneck to user expansion (as commonly experienced with the central server architecture of known systems). As a result, the communications load and hardware requirements at each site are independent of the size of the network. For example, if a sales agent using his notebook is working with 200 customers, the communications load is defined by the changes made to those 200 customers and the local database will store only those 200 customers. If next year there are ten times as many sales agents, then each sales agent will still be storing about 200 customers in their local, i.e. notebook, computer, and the agent is still working with his 200 customers and the communications and local database loads for the agent remain unchanged regardless of the total size of the network or the total number of sites and users. The IDDB according to the present invention runs an application as easily at 10 sites as it does at 10,000.
Reliability. It is feature of the present invention that redundancy is built into the network, thereby reducing or eliminating the need for backups. If a site is destroyed, the IDDB application is reinstalled with a blank database and connected to the network. Once re-attached to the network, the application receives an initial download and recovers all of its information from the other sites connected in the network to achieve normal operation. The only data that would be lost are the changes made at the site since the last sync operation, however, no users at the other sites are affected or need to re-enter data. The IDDB provides full reliability because no site depends on another site for its operation.
Availability. According to the invention, if one site is down, no other site is affected because no site depends on another and all work at a site is done off-line by default. If all other sites in the network are down for a week or a month, and users at the remaining sites continue working, the users will eventually notice that their changes are not being seen by anyone else and that no one else""s changes are appearing to them. As a result, windows of acceptable down time can be measured in days or weeks, not seconds or minutes as in prior art systems. According to this aspect of the invention, the IDDB provides improved availability primarily because it always fully replicates all data.
Autonomy. According to the present invention, the sites are fully independent of each other and also independent of the communications link. For example, sales agents who are on the road, and executives who are travelling in airplanes, can continue working as usual regardless of whether they are currently connected to a modem or a network link. It is a feature of the IDDB that all data needed for an application(s) is actually stored at each local site. This means that users on the IDDB are able to work with data without knowing exactly where else, i.e. at other sites in the network, the data is also stored.
Furthermore, the IDDB exhibits fragmentation independence, that is, sites in the IDDB operate as though the database is not fragmented at all, because for each site its local copy of a table is the whole table. Each site, however, will by definition have some fragment of the database, defined by the information its users need.
Security. It is a feature of the IDDB database according to the present invention that all communication links are encrypted. All data stored locally, even on an untrusted machine run by a potentially untrusted user, is secured so that it can be accessed only through a legitimate application running on the system.
According to another aspect of the invention, the IDDB features a network architecture which comprises one or more application networks. An application network is defined as the set of all sites running a given IDDB application. The application network is a virtual network running on top of a physical network connection. It is a feature of the invention that a given site may run several different IDDB applications at the same time.
The network architecture preferably comprises a network structure that allows all sites to communicate efficiently and effectively. In particular, the network structure preferably has the capability to distinguish between stable sites and transient sites in order to minimize dependencies on transient sites. A stable site is defined as a site which features high availability and forms a long-term component of the application network, for example a site or node located within the organization that owns or operates the application. A transient site, on the other hand, comprises a site which is either intermittently available or a short-term participant in the application network, for example a computer belonging to a mobile user or users outside of the parent organization.
The network structure according to the present invention also features fault detection and repair mechanisms, including automatic network reconfiguration.
The network structure also comprises suitable sub-networks for each activity group. An activity group is defined as the group of sites presently collaborating on a given activity, i.e. storing a copy of that activity""s data (or some sub-set thereof). Preferably, the network structure provides the capability to manage dependencies on transient sites which are participating and provides effective automatic error recovery and reconfiguration.
The independent database according to the present invention also features the capability to replicate updates, so that any change made in an activity at a site becomes visible to all sites belonging to the activity group in that application network. According to this aspect of the invention, updates are propagated. To do this efficiently, two sites must be able to agree on the xe2x80x9cagexe2x80x9d of each piece of data in the database, so that newer versions correctly update older ones without introducing unnecessary updates when both sites already have the same version of the data. Accordingly, the present invention includes mechanisms to allow fragment age agreement and accommodation of relative clock drift between sites, and the means for providing consistent local time stamping when there are several, and possibly inconsistent, local clocks at the same site.
Based on the activity as the unit of collaboration, the replication rules according to the present invention feature ease of implementation and administration. A simple change (e.g., xe2x80x9cSite 3 now needs Customer #531""s informationxe2x80x9d) requires a simple command only (xe2x80x9cattach to Customer #531xe2x80x9d), and the IDDBMS automatically includes all related information in related tables. Using the fragment as the unit of replication, fields with a common update responsibility are replicated as a unit; changes to unrelated fields (e.g., Address and CreditRating) never result in false collisions requiring tedious administration, and changes to related fields (e.g., Address and ZipCode) are always correctly identified as collisions.
The IDDB according to the present invention also features a novel independent distributed database management system (IDDBMS). According to this aspect of the invention, a database comprises a collection of activities that can be collaborated on by various users at various sites and services that users and sites can selectively use. The IDDBMS according to the present invention provides a mechanism whereby a site, working off-line from all others, can create a new record and therefore a new key. The new keys are generated off-line in such a manner that the generated key is guaranteed to be unique across the entire database. In addition, the IDDBMS includes means for correctly handling record deletion and record modification across the entire database.
In another aspect, the database management system (IDDBMS) according to the present invention includes means for replicating modified data. The means for replicating modified data comprises a fine-grained replication process based on record fragments. A record fragment according to the present invention is a piece of an individual record, and comprises a subset of columns in a record.
In another aspect, the IDDBMS according to the present invention includes means for determining whether a fragment has been damaged and means for recovering a damaged fragment.
In yet another aspect, the IDDBMS according to the present invention includes means for securing the information transmitted across the application networks. Since each site may be part of several application networks (i.e. if the user has installed multiple IDDB applications), the security of each application must be isolated so that each application provider can separately handle the user""s permissions, password change requirements, and other security details for the application regardless of the user""s access privileges to other applications running at the same site. In particular, a user having privileges in one application must not have the capability to use this authority to gain greater access to the database of another application. According to this aspect of the invention, the IDDBMS includes means for ensuring that the application""s database can be read and written only through a legitimate application program and by legitimate users. In particular, the IDDBMS prevents a user from bypassing the application and inspecting or changing the physical contents of the local database file.
Another feature of the IDDBMS according to the present invention is the elimination of the need for distributed query processing. In a traditional distributed database, query optimization is critical for the performance of the system. In the present invention, query processing is simplified because transactions do not depend on the availability of other sites in the system, i.e. all database transactions are local. Furthermore, the need for a distributed transaction manager is also eliminated.
In yet another aspect, the IDDB according to the present invention provides a means for operating inherently incompatible commercially available Database Management Systems (DBMS). According to this aspect of the invention, the IDDB utilizes a DBMS-independent channel, for example, ODBC (Open Database Connectivity), for accessing the database product, and the IDDB separates the distribution and security controls from the physical database. This feature allows existing database management systems (DBMS""s), such as, ORACLE(trademark), INGRES(trademark), SYBASE(trademark), PARADOX(trademark) and ACCESS(trademark) products, to be used together transparently at different sites on the same application network in the IDDB.
In a first aspect, the present invention provides a distributed relational database system for a computer network, said system comprising: a plurality of sites; each of said sites including processing means for storing and retrieving information locally and independent of said other sites, and wherein each of said sites is the logical peer of said other sites; said sites having means for connecting to said network and communicating with other sites connected to the network; and said processing means including means for transferring selected information stored locally by connecting to said network and transferring said selected information to other sites connected to the network.
In a second aspect, there is provided a security structure for a distributed relational database system having a plurality of sites connected to a computer network and having means for communicating over the computer network, said security structure comprising: (a) a trusted root, an organization certification authority, and an application certification authority; (b) said trusted root having means for generating license certificates for validating said organization certification authority; (c) said organization certification authority having means for generating license certificates for validating said application authority; and (d) said application certification authority having means for generating license certificates for selected sites wherein said selected sites belong to an application network and said selected sites use said license certificates for validating each other.
In a third aspect, the present invention provides a method for determining a reference time between sites belonging to a distributed relational database system and being coupled by a computer network, said sites having local processing means and time generators, said method comprising the steps of: (a) sending a first message from an initiator site to a receiver site at a start time; (b) determining an arrival time when said first message is received at said receiver site; (c) said receiver site sending a second message to said initiator site in response to receipt of said first message; (d) determining a reply time when said second message is received at said initiator site; (e) said initiator site determining a reference time from the midpoint of the interval between said start and reply times, and said receiver site using said arrival time as its reference time.
In another aspect, the present invention provides a method for determining a reference time between sites in a distributed relational database system, the sites being coupled by a computer network and having local processing means and clocks, said method comprising the steps of: (a) sending a first message from an initiator site to a receiver site at a time t1; (b) said receiver site determining a time t2 when said first message is received; (c) said receiver site sending a second message at time t3 to said initiator site in response to receipt of said first message; (d) said initiator site determining a time t4 when said second message is received; (e) after said second message is received, said initiator site sending a third message at time t5 to said receiver site; (f) said receiver site determining a time t6 when said third message is received; (g) said initiator site determining a first time value by calculating a midpoint for the interval between said time t1 and said time t4, and generating a first time difference by comparing said first time value with said time t2 when said first message was received by said receiver site; (h) said receiver site determining a second time value by calculating a midpoint for the interval between said time t3 and said time t6, and said receiver site generating a second time difference by comparing said second time value with said time t4 when said second message was received by said initiator site; (i) averaging said first and second time differences to produce an average time difference, wherein said initiator site uses a reference time relative to its local clock, and said receiver site uses said average time difference to calculate a corresponding reference time relative to its local clock.
In yet a further aspect, the present invention provides for a distributed relational database system comprising sites coupled by a computer network and the sites having local processing means and clocks, a method for checking the clocks at the sites, said method comprising the steps of: (a) identifying a designated time keeper site from among said sites; (b) determining a time difference value between the clock of said designated time keeper site and the clock at the other site; (c) generating a time-stamp at said other site by off-setting the time of the local clock at said other site with said time difference value.
According to another aspect, the present invention provides a mechanism for specifying transactions that should be used during replication operations i.e. xe2x80x9creplication transactionsxe2x80x9d. The transaction mechanism for replication according to this aspect of the present invention includes the following features.
The transaction mechanism operates automatically and transparently. The transaction mechanism does not require any form of runtime intervention by users, database administrators (DBAs), programmers, or other user intervention.
The transaction mechanism is not tied to transaction logs. It will be appreciated that this feature is particularly advantageous since transaction logs are not always available as some vendors"" databases do not publish their log formats/APIs and other databases do not have logs at all.
The transaction mechanism supports all ACID, i.e. Atomicity, Consistency, Isolation, Durability, requirements. These requirements are preserved in the transactions to the extent supported by the underlying database.
The transaction mechanism supports all typical transaction requirements. This includes the transaction requirements typical of the highest-integrity systems, such as banking and market trading systems.
The transaction mechanism preserves peer-to-peer and update-anywhere operation. The transaction mechanism is not limited to hub-and-spoke or master-site or source-target operation. The transaction mechanism supports a true peer environment wherein updates may originate at any site.
The transaction mechanism provides xe2x80x9cbetter-than-nativexe2x80x9d integrity and improves atomicity, consistency, isolation, and/or durability in most situations.