It is well known that a shared memory is a memory area shared by two or more processes (or threads) in a multi-task environment and usually used for supporting high-speed data transmission. Typically, each process using the shared memory conforms to a set of rules that prohibits two or more processes from simultaneously accessing (writing or reading) the same memory area.
Specifically, a mutex can be utilized to prevent two or more processes from simultaneously accessing the shared memory. As is well known to those, skilled in the art, a mutex is an inter-process synchronization mechanism, wherein a process owning the mutex enjoys exclusive access to the shared memory, and only after the mutex is released, can other process perform exclusive access to the shared memory.
In the prior art, a shared memory typically contains a header section and a data section, where the header section holds critical control data such as semaphores, size of memory block, pointers to available locations etc.; and the data section is an area where a process performs a data operation, such as reading/writing data, during accessing the shared memory. In general, a process owning a mutex performs data operations within the data, section based on the control data in the header section. Hence, the control data in the header section is critical to the correct operations, of systems using this shared memory. If data in the header section is corrupt, the systems using the shared memory area will probably stop functioning. “Data corruption” usually means control data blocks in the header section of a shared memory are damaged or lost or inconsistent with a data status of the data section. That is, the control data in the header section cannot correctly reflect the data status in the data section.
If data corruption takes place in the header section of a shared memory, remedial steps are taken by providing supporting scripts that solve the problem (delete and re-create the shared memory), or by restarting all applications using the shared memory, or by rebooting the machine. However, in either case, the applications are subject to down time. A shared memory is usually used for inter-process communication as a high-speed mechanism, and thus: any down time is very undesirable and unacceptable.
As described above, when a plurality of threads or processes intend to use a shared memory, a mutex is usually utilized to control exclusive access to the shared memory segments, including the header section. That is to say, only a thread or process owning the mutex is entitled to perform operations with the shared memory area. A typical procedure to exclusively access the shared memory by a process or thread includes the steps of:
(1) locking the mutex to get an exclusive access to the shared memory, wherein after locking, only the current process or thread can access the shared memory, while the other processes or threads keep waiting until the mutex is released;
(2) reading the control data in the “header section” of the shared memory, in order to know how many bytes are available for writing (or reading), and where to write (or read) etc.;
(3) if the shared memory is available for writing (or reading), then starting data operation within the “data section” of the shared memory;
(4) after data operation, updating the “header section” using a new status of the “data section,” i.e. writing (or reading) a new location, a new available space to write (or read), etc.; and,
(5) after steps (3) through step (4) are fully completed, then releasing the locking of the mutex.
Once the current process owning the mutex crashes during access to the shared memory, e.g. a process crash occurs during updating the header section of the shared memory in step (4), so it is very likely to break an update. Therefore, the control data in the header section of the shared memory will not be consistent with the data status in the data section any more and the control data corruption will occur in the header section of the shared memory. As described previously, such data corruption will cause the shared memory not to function normally any more and even lose data completely.
Moreover, in the environment of an operating system such as Windows, if a process crashes for some reason when a mutex is being locked by the process, the process has not released its ownership of the mutex it owns. At this point, the mutex is considered to be abandoned, which subsequently causes any other process to get WAIT_ABANDONED returning code when trying to obtain the mutex by the Windows operating system calling “WaitForSingleObject( ).” This means that although the mutex is in an idle status, it cannot be owned by other processes for use of exclusive access to the shared memory. Hence, other processes are unable to access the shared memory, failing to obtain the available mutex.
Therefore, there is a need for a technical solution capable of solving shared memory data corruption in the prior art. Moreover, there is a need for a technical solution that ensures, in the environment of an operating system such as Windows, a process to get another available mutex when a mutex is in an abandoned status.