The field of this invention pertains to computer systems, and more particularly, is directed to multi-threaded processing in a computer system.
A typical computer system is comprised of a collection of software, accessible by a variety of executing threads; i.e. applications.
As shown in FIG. 2, software code may be organized into software libraries 101. A software library 101 is a collection of software packages 103. A software package 103 is a collection of related procedures 105 and/or functions 107, collectively termed routines, stored together in the computer system for continued use as a unit. Software packages 103 provide a method of encapsulating and storing related procedures 105 and functions 107 together as a unit.
Procedures 105 and functions 107 each consist of a set of computer language statements functionally grouped together as an executable unit to perform a specific task. Procedures 105 and functions 107 are essentially equivalent constructs, except that functions 107 return a single value to the caller, while procedures 105 do not return any value. Procedures 105 and functions 107 may be called explicitly by applications or user stations of a computer system.
A typical multi-context/shared data architecture for a computer system, as depicted in FIG. 3, has a plurality of software packages in a software library 201. Each software package has a related context associated with it. A context in this multi-context/shared data architecture is a collection of data required by the routines of a software package to execute properly. Thus, the software package 202 is associated with the context 203, the software package 204 is associated with the context 205 and the software package 206 is associated with the context 207.
The data in each context in the software library 201 is designated, i.e., designed to be, shared, or global, or exposed, data. Thus, threads executing the software library, either consecutively or concurrently, access the same data in a context when they call the same routines of a software package.
Threads are lightweight processes that exist within a larger process. In a multi-context/shared data architecture, threads share the same software code and related data, but have their own individual program counters, machine registers and stacks. A process is a mechanism that can execute a series of steps; some systems refer to processes as xe2x80x9cjobsxe2x80x9d or xe2x80x9ctasksxe2x80x9d.
For example, if there are two threads 304 and 306, as depicted in FIG. 4A, executing consecutively, and both call the same procedure 305 in the software package 302, then both threads 304 and 306 share data in the context 303. However, because the two threads 304 and 306 execute consecutively, i.e., one after the other, there are no data access contention issues. As one thread executes before the other, the first thread to execute finishes accessing the shared data in the context 303 before the second thread begins to execute. Executing consecutively, threads 304 and 306 will not access the same data at the same time.
Yet, if the same two threads 304 and 306 execute concurrently, and both threads 304 and 306 call the same procedure 305 in the software package 302, the situation raises data access contention concerns. This is because both threads 304 and 306 may potentially attempt to access the same data in the context 303 at essentially the same time.
In such a situation, one thread may change the value of a data item that a second thread is relying on to have an original, unchanged value. This can result in faulty processing by the second thread. Thus, when two or more threads may potentially access the same data at the same time, a mutual exclusivity mechanism, i.e., a mutex, is generally required to manage access to the shared data in the context 303.
A mutex may be a semaphore 312. When a first thread 304 calls 308 a procedure 305 in a software package 302, thereby accessing data in the related context 303, the semaphore 312 associated with context 303 is set. When a second thread 306 thereafter calls 310 the procedure 305 in the software package 302, involving access to the data in the related context 303, the second thread 306 first checks the context 303 associated semaphore 312. As semaphore 312 has been set by the first thread 304, the second thread 306 must wait to gain access to data in the context 303.
When the first thread 304 exits from the procedure 305 in the software package 302, it no longer requires access to the data in the related context 303. Thus, the semaphore 312 is reset to indicate that the data in the context 303 is not being accessed. Sometime thereafter, when the second thread 306 checks the semaphore 312 and discovers that it indicates that the data in the context 303 is not being accessed, then the semaphore 312 is once again set, and the second thread 306 continues its processing, executing the procedure 305.
As with the first thread 304, when the second thread 306 exits the procedure 305 in the software package 302, it no longer requires access to the data in the related context 303. Thus, the semaphore 312 is once more reset to indicate that the data in the context 303 is not being accessed.
A package that uses shared data and relies on a mutex to protect the data from access contentions is referred to as a thread-safe package. An advantage of a thread-safe package is that threads may share the same data. Another advantage of a thread-safe package is that it conserves memory space. The same data is used by each thread calling a particular routine in a specific software package. Thus, in computer systems where memory space and/or allocation are at a premium, thread-safe packages are favored.
A disadvantage of thread-safe packages is that their use can affect the processing time of concurrently executing threads. For example, if there are ten threads executing concurrently and they all call a routine in the same software package at essentially the same time, then nine threads will thereafter have to wait for the first thread to gain access to the data in the related context to exit the associated routine. Thereafter, eight threads will have to wait for the second thread to gain access to the data in the related context to exit the associated routine, and so on. The tenth thread to gain access to the respective data, thereby continuing its processing, may be forced to wait a relatively long time to do so. In time critical processing threads, this wait may fail to meet established system performance requirements.
In an alternative multi-context/unshared data architecture, an example of which is depicted in FIG. 4B, the data in each context is designated unshared, or private, or local. With this architecture, threads executing either consecutively or concurrently create a copy of the context for each software package that they access via a routine call. The software packages in a multi-context/unshared data architecture do not rely on a mutex to protect data from concurrent access. Thus, these packages may be referred to as non thread-safe packages.
Non thread-safe packages, rather than using mutexes for data protection, rely on the accessing threads to first make a copy of the package""s context, and thereafter execute the respective routines using the context copy.
For example, if two threads 404 and 406 execute consecutively, and both threads call the same procedure 405 in the software package 402, then both threads will need to access the data in the related context 403. As the software package 402 is non thread-safe, meaning its related data in the context 403 is not to be shared, when a first thread 404 calls 408 procedure 405 in the software package 402, a copy 412 of the context 403 is created for this first thread""s use. Likewise, when a second thread 406 calls 410 procedure 405, a copy 414 of the context 403 is created for this second thread""s use.
The processing is similar if both threads 404 and 406 execute concurrently. When the first thread 404 calls 408 procedure 405 in the software package 402, a copy 412 of the context 403 is created for this first thread""s use. When the second thread 406 calls 410 procedure 405, a copy 414 of the context 403 is created for this second thread""s use.
An advantage of a non thread-safe package is that, when more than one thread concurrently calls one of its routines, thread processing may be expected to be quicker than if the package was thread-safe, and, therefore, used a mutex to control access to its shared data. For example, as shown in FIG. 4B, the second thread 406 does not have to wait for the first thread 404 to execute the procedure 405 before it may execute the procedure 405. Because each thread 404 and 406 in this system uses its own copy of the context 403, coordination and control of the data access between these threads is not required.
However, a disadvantage of a non thread-safe package is that it does not allow for the sharing of data, which may be a requirement or a desired feature of the system. Further, the use of a non thread-safe package can entail significantly more memory than the use of an equivalent thread-safe package. For each thread that executes, i.e., runs, in a multi-thread/unshared data system, a copy of the context for each package accessed must be created. For example, if there are ten threads executing, and they all call a routine in the same software package, and the related context is one megabyte in size, than an additional ten megabytes of memory must be available for creating the context copies. For systems in which the memory space is small, or, for whatever reason, is at a premium, the multi-context/unshared data architecture can prove unworkable.
An additional problem with either architecture, i.e., multi-context/shared data or multi-context/unshared data, is that a thread calling a particular routine in a software package must be cognizant of the individual software package""s associated context. Further, the thread, if it calls several routines from a variety of software packages, must keep track of which context is associated with which software package. This can add to the data management requirements, thereby increasing the overall complexity of a thread""s processing.
For example, in a multi-context/shared design system, a thread calling a routine in a first software package must be able to identify the first software package""s associated context, and distinguish it from the contexts associated with other software packages, in order to correctly manage the mutex for the first software package. Additionally, each context may have a different access mechanism, requiring a thread to keep track of and use different access mechanisms to different contexts. In a computer system using multi-context/unshared data, a thread calling a routine in a first software package must be able to identify the first software package""s associated context, and distinguish it from the contexts associated with other software packages, in order to create a copy of the correct context. Too, in a computer system using multi-context unshared data, a thread may have to access the individual contexts of various software packages differently, in order to create the respective copies thereof.
It would, therefore, be advantageous to provide a simple, single, uniform interface for threads to use to access the various data segments of a software library, or any collection of software units and their related data.
Further, it would be advantageous to provide a mechanism whereby threads can access both shared and unshared data in the same software unit, e.g., a software package. This would provide flexibility both in handling memory allocation issues and time processing concerns in a computer system. A more optimum computer system for handling multi-thread processing would thereby be created.
The invention provides a computer system with a single, common interface to the data segments of a collection of software units. The invention further provides apparatus and methods for allowing threads to access both shared and unshared data in a single software unit.
In an embodiment, a collection of software units is comprised of units that are associated with data segments of shared data, i.e., thread-safe software units, and units that are associated with data segments of unshared data, i.e., non thread-safe software units. A collection of location variables, generally stored in a location data segment, is associated with the collection of software units. Each location variable is associated with an address for a data segment for a software unit.
Each time a thread is to execute software in the collection of software units it creates a version of the respective location data segment. If the thread will thereafter access a software unit that has associated shared data, the thread sets a pointer in its location data segment version to the equivalent value in the respective location data segment. In this manner, the thread will access the associated data segment of the software unit when it accesses the software unit.
If the thread will access a software unit that has associated unshared data, the thread creates a copy of the respective unshared data segment. The thread then sets a pointer in its location data segment version to point to the address of the unshared data segment copy. In this manner, the thread will access its copy of the unshared data segment when it accesses the respective software unit.
Thus, a general object of the invention is to provide a flexible multi-thread environment in which each thread can access data segments associated with a collection of software units via a single interface. A further general object of the invention is to allow threads to access both shared and unshared data in the same collection of software units. Other and further objects, features, aspects and advantages of the invention will become better understood with the following detailed description of the accompanying drawings.