1. Field of the Invention
The present invention relates, in general, to managing shared resources in a computer system with a multithread processing environment, and, more particularly, to software, systems and methods for determining ownership of a lock to a computer system resource that can be owned by multiple threads.
2. Relevant Background
Computer system designers and analysts face the ongoing and often difficult task of determining how to fix or improve operation of a computer system that has experienced an unexpected exception or is failing to operate as designed (e.g., is experiencing errors caused by software problems or “bugs”). When a problem or bug in the computer system software is serious enough to stop or interrupt the execution of a running program, this failure is known as a crash. Often, computer systems may simply fail to operate as efficiently and quickly as possible due to inefficient or slow processing by one or more processes that block access by other process to computer system resources, such as to memory or a processor. Hence, one problem facing system designers and analysts is how to make computer systems including operating systems more effective in managing concurrent processing and resource management.
Computer systems often support processes with multiple threads of execution (i.e., threads) that can work together on a single computational task. The term “thread” in a general sense refers merely to a simple execution path through application software and the kernel of an operating system executing with the computer. Threads share an address space, open files, and other resources but each thread typically has its own stack in memory that contains the thread execution history with one frame for each procedure or function called but not yet returned from, e.g., a frame in the thread stack may include a pointer to read functions, write functions, and resources which the thread is waiting for access. Similarly, each thread of the process typically has its own stack maintained by the system processor divided into a set of active frames and a set of inactive frames for threads or functions that have been called and returned. One of the challenges in using multithreading is to synchronize the threads and processes so that they do not interfere with each other. This is typically accomplished through mutual exclusion locks (mutex locks), which are used to ensure that only one thread or process at a time performs a particular task or has access to specific items of shared data.
A thread typically attempts to “acquire” a lock before executing a critical section of code or accessing specific items of shared data. If no other thread presently holds the lock, the thread acquires the lock by setting the lock to a locked state. After acquiring the lock, the thread is free to execute the critical section of code or manipulate the items of shared data without interference from other threads. While the thread holds the lock, other threads attempting to acquire the lock will “block” waiting for the lock, and will not be able to proceed until the lock is released. After the thread completes the task, it releases the lock, thereby allowing other threads to acquire the lock. The kernel maintains a list of threads waiting to obtain the lock (e.g., sleeping on a lock) to know what threads to wake up when the lock is released.
To assist in identifying bugs in the software operating on a computer system such as those that cause unexpected or unacceptable hangs, software applications are often configured to write a copy of the memory image of the existing state of the application or kernel at the time of the crash or exception into a file. These memory image files are sometimes called core files or core dumps. The system-level commands or programs in the operating system, i.e., the kernel software, are of particular interest to system analysts in correcting bugs in a crashed computer system. For example, in UNIX®-based systems, the kernel is the program that contains the device drivers, the memory management routines, the scheduler, and system calls. Often, fixing bugs begins with analysis of these programs, which have their state stored in a core file. Similarly, at the user level or in the user space, programs or binaries (e.g., binary, machine readable forms of programs that have been compiled or assembled) can have their state stored in user core files for later use in identifying the bugs causing the user applications to crash or run ineffectively.
However, debugging a program, application, or kernel based solely on the core file can be a very difficult and time-consuming task in multithreading environments in which crashes or inefficiencies are caused by hangs due to a thread blocking access to a particular shared resource. In some cases, the kernel stores ownership information for each active lock indicating the thread of execution holding the lock. In contrast, many multi-process and multithreading environments utilize locks that can be owned by multiple owners, i.e., multiple owner locks. Typically, the kernel does not store ownership information for multiple owner locks because it would be difficult and inefficient to maintain an arbitrary length list of owners that generally is only useful for debugging owner programs or threads. When a core file is examined that includes a multiple owner lock, a debugger can readily identify the lock that is causing operation problems such as an unacceptable hang, but debugging then becomes difficult as the debugger cannot readily identify the thread that owned the lock at the time of the hang.
Hence, there remains a need for improved methods and mechanisms for use in determining ownership of a multiple owner lock based on a crash dump or a core file from a computer system or memory of a live system with a multithreading operating environment.