1. Field of the Invention
This invention relates to distributed data systems, and more particularly to data locking for multi-threaded processes in distributed data systems.
2. Description of the Related Art
In distributed data systems, data may be stored in several locations. Such locations may include servers, computers, or other devices with storage devices or access to storage devices. Storage devices may include hard drives, memory, registers, and other media where data can be stored and retrieved. A distributed data system may span a large network or combination of networks, for example on the Internet or a local intranet, or simply involve a plurality of storage devices connected to a computing device. The data may be distributed in blocks of specific sizes, by file, or any fashion according with space constraints of available storage devices.
Distributed data may facilitate scalability, fail-safe techniques, and security. For example, a server may distribute activity to remain scalable with respect to network traffic. In this example, distributed data may include state information for each process and/or computing system over which a client-server interaction is distributed in an enterprise system. The distributed data may include a snapshot of interaction or a session between a web browser and a web server. The snapshot or session state may include one or more of the state of the web browser process(es), the state of the computing system hosting the web browser, the state of the web server process(es), the state of the computing system hosting the web server, the state of the computing system hosting an application server providing content to the web server, the state of the application server process(es), and the state of one or more applications, processes and/or threads hosted by the application server or optionally on any other system involved in the interaction.
An enterprise computing system storing distributed session data is only one example of a distributed data system. Distributed data system may store and share any type across a plurality of computing nodes. Distributed data systems may provide for load balancing and fail over to improve the overall quality of service of the system.
Primary data may be defined as a global instance of distributed data accessible by one or more processes. The term “process” is used herein to refer to a computer process. A distributed data system may include primary data stored within a distributed store. Local data may be defined as an instance of distributed data stored locally with respect to a process. Local data may provide read and/or write access to portions of the distributed data for a process. The local data may be used to update the primary data of the distributed store.
A client-server environment may use distributed data, for example. The distributed data may include session data for one or more sessions. A session may include a series of user-application interactions that may be tracked by one or more servers. Sessions may be used for maintaining user-specific states, and may include persistent objects (e.g. handles to Enterprise Java Beans and/or database record sets) and authenticated user identities, among other interactions. For example, a session may be used to track a validated user login followed by a series of directed activities for that particular user. The session may reside in a server. For each request, a client may transmit a session ID in a cookie or, if the client does not allow cookies, the server may automatically write a session ID into a URL. The session ID may be used as a database key to access persistent objects associated with the client. Types of sessions may include, but are not limited to, distributed sessions and local sessions. Distributed sessions may be distributed among multiple servers, for example in a cluster, whereas local sessions may be bound to an individual server. In other systems, distributed data may include other types of data and may not necessarily include session data.
Client-server applications may store distributed session information as snapshots of the states of participating processes, resources, and computing systems to minimize data loss in case of failure. Current techniques for accessing state information from distributed sessions may result in inconsistent distributed data and consume significant amounts of resources.
A portion of distributed data may be retrieved and written by multiple processes concurrently, resulting in a risk of data loss. For example, a first process may access a portion of local data representing an instance of a portion of distributed data, while a second process may accesses a portion of local data representing another instance of the same portion of distributed data. Then, the first process may update the primary data. The second process may update the primary data after the first process. Portions of the primary data updated by the first process may be overwritten, resulting in loss of data. This data loss may be referred to as “data clobbering.”
In distributed data systems, it may be desirable for a process to access portions of distributed data using the same or similar semantics used in accessing portions of local non-distributed data. Typically, to help prevent data clobbering, a distributed data system may include a lock mechanism. The lock mechanism may grant locks to processes for portions of primary data. While a process holds a lock for a portion of primary data, other processes may not access the locked portion. Other processes may hold locks for other portions of primary data. Managing locks may be a complex task for the primary or backend portions of distributed data that are accessible by multiple different processes in a distributed system. This complexity may be even greater for systems including multithreaded processes in which multiple threads of a process share access to a local instance of a portion of distributed data.