The present invention generally relates to systems for accessing trace data produced in a data storage system, and more particularly, to systems and techniques which provide a host computer system with remote access to such trace data in a continuous manner as the trace data is produced in the data storage system.
Most types of computer systems have a requirement to maintain data for prolonged periods of time. To meet this requirement, a typical computer system includes a coupling to a data storage system which the computer system can access to store and retrieve the data. The computer system may be coupled to the data storage system via a high speed data transfer interface (e.g., a small computer system interface (SCSI), a Fibre-Channel interface (e.g., ESCON), or the like), or the coupling may be formed over a computer network such as a Storage Area Network (SAN) that may link a plurality of computer systems to one or more high-capacity data storage systems. Through an interoperation of software (e.g., applications, operating systems, protocols and the like) and hardware (e.g., circuitry) in both the computer system and the data storage system, the computer system is able to access data within storage media (e.g., disk drives) that the data storage system controls.
Within a typical data storage system, one or more processors (e.g., Central Processing Units or CPUs) operate according to prescribed software program(s) to manage and control access to the data within the storage media in the data storage system on behalf of the computer systems that request access to such data. Such data storage system software programs are generally considered the operating system or control program for the data storage system. For example, within a high-capacity data storage system such as one of the Symmetrix line of data storage systems manufactured by EMC Corporation of Hopkinton, Mass., U.S.A., a front end interface provides a coupling for the data storage system to one or more computer system(s) (via direct interfaces or via a SAN) while a back end interface provides a coupling to the storage media devices (e.g., disk drives) within the data storage system that stores data. The front and back end interfaces are coupled by a data bus (one or more) which allows the interfaces to interoperate with each other. A cache memory system is accessible on the data bus for use by the front and back end interfaces to temporarily store data during processing. A processor operating within the front end interface (e.g., on a circuit board that operates as the front end interface) operates a software program (e.g., firmware or microcode) that performs high speed processing of data (and requests for such data) between the front end interface and the remotely connected computer systems. Likewise, the back end interface includes a processor that operates a software program to handle tasks associated with accessing (e.g., reading and writing) data to and from the storage devices within the data storage system based on the requests received by the front end interface.
Due to the complex operation of a typical data storage system, the software programs which perform (e.g., execute or otherwise operate) on processor(s) such as the front and back end interfaces within a data storage system can become quite large and complex. By way of example, the microcode software program which provides the access request processing operations for a front end interface within a Symmetrix data storage system may be many thousands of lines of code in length. During the design, development and testing of such complex data storage system software control programs, software developers frequently include the ability for the software program to operate in a xe2x80x9ctrace modexe2x80x9d which allows the program to trace the occurrence of certain trace events during the program""s operation. Essentially, trace mode operation causes the software program to capture trace data as defined by a software developer in relation to an occurrence of the certain defined or selected trace events.
Before operating a software program in trace mode in a conventional data storage system, a software developer is able to define one or more trace events and associated trace data which is to be captured upon occurrence of each trace event. The software developer can then operate the software program in trace mode. While operating in trace mode, the software program in the data storage system is able to detect occurrences of each trace event during operation of the software program. Upon detection of a trace event, the software program performs or calls a designated trace routine (which is itself generally considered part of the software control program) which is responsible for capturing trace data (e.g., variable or data structure values, data access request formats, command parameters, and so forth) related to the trace event. The trace routine places the trace data, which may include the current values of data structures, parameters, input/output request values, and so forth that are relevant to the trace event as designated by the software developer, within a trace buffer in the cache memory system within the data storage system. The trace buffer is typically a reserved area of the cache memory system which is limited in size, for example, to sixteen or thirty-two megabytes (MB).
After the software developer operates the software program in trace mode for a certain period of time in order to exercise the features of the program which would typically cause the trace event(s) to occur, the software developer can halt the operation of the software program. At this point, the trace buffer in the cache memory system contains the trace data which the trace routines captured during operation of the software program at the occurrence of each trace event. The software developer can then view the trace data within the trace buffer in the cache memory system by interaction, for instance, with a service processor (e.g., a keyboard and monitor) which is integrated as part of the data storage system. By reviewing the trace data, the software developer can determine if the software program for which the trace data was generated had properly performed in the data storage system during its operation.
Some data storage system configurations include a service processor that allows the software developer to download the trace data from the trace buffer onto a removable storage medium such as a floppy disk so that the trace data can be transported to another computer system for further analysis. By viewing the trace data according to these techniques, the software developer can debug the software program to determine whether or not it is operating properly.
Conventional systems and techniques for obtaining access to trace data produced as a result of operation of a data storage system suffer from a number of deficiencies.
One such deficiency relates to the limited size and/or capacity of a trace buffer within the cache of a data storage system. As noted above, in a typical conventional implementation of a trace buffer in a data storage system, the trace buffer is a circular trace buffer which is limited in size, for example, to 16 or 32 MB. Due to this limited size or capacity, trace routines which place trace data into the trace buffer manage the trace buffer as a circular queue and are thus able to continually write trace data to the trace buffer. For example, a software developer may define a number of trace events for which trace data is to be produced during trace mode operation of a software program under test within a data storage system. During operation of the software program in trace mode, as each trace event occurs, the software program activates one or more trace routines which place (i.e., write) a certain amount of trace data into the trace buffer.
Depending upon the frequency of occurrence of the trace events (i.e., the time between traced events), or the size or amount of trace data written for each trace event to the trace buffer, and/or how long (e.g., how much time) or how fast or slow the software program continues to operate in trace mode, trace routines may place trace data into the trace buffer at varying rates and in various amounts. The trace buffer may become completely full with trace data at some point during the operation of the software program in trace mode. That is, so much trace data may be created that the trace buffer area in the cache memory is fully consumed with trace data. However, since conventional trace routines operate the trace buffer as a circular trace buffer, the routines that produce the trace data begin to re-write trace data at the start of the trace buffer if the trace buffer becomes completely full with trace data. Stated differently, once the trace buffer is filled with trace data, conventional trace routines begin to overwrite any existing trace data at the beginning of the trace buffer with the most recently generated (i.e., the newest) trace data. The trace routines continue in this manner by writing trace data generated for each trace event at a location in the trace buffer corresponding to the end of the most recently written portion of trace data. This is problematic since trace data may be lost (i.e., overwritten) after the trace routines begin overwriting old trace data at the start of the trace buffer with new trace data.
Conventional trace routines, which are typically incorporated as part of the software control program operating in trace mode within the data storage system, maintain a trace buffer pointer indicating the current location to which any new trace data is to be written upon the occurrence of the next trace event. Each time a trace routine adds trace data to the trace buffer, the trace routine updates the trace buffer pointer to the point to the end of the trace data in the trace buffer.
Another problem with the conventional approach to accessing trace data in a data storage system is that a software developer may be limited in the amount of time that he or she is able to operate a software program in trace mode while still being able to accurately capture trace data from the trace buffer after halting operation of the software program. If a software developer allows the software program to operate for too much time, older trace data in the trace buffer that trace routines generate during the early stages of operation of the software program may likely be overwritten by trace data generated in later stages of operation of the software program. Thus, trace data can be lost and it is difficult to perform an accurate analysis of prolonged operation of the software program due to the problem of incomplete or missing (i.e., overwritten) trace data.
Likewise, conventional trace data access techniques tend to restrict the number of different trace events that a software developer can select for capture of trace data during trace mode operation of a software program. This is because each trace event causes a certain amount of trace data to be placed into the trace buffer. Some events may cause trace routines to capture large amounts of trace data while other events may require the capture of only limited amounts of trace data. Accordingly, if trace data for many different trace events is to be captured in the trace buffer during trace mode operation of a software program, or if a small number of trace events are selected but each trace event produces large amounts of trace data, the software developer may be inclined to only operate the software control program in trace mode for a short period of time in at attempt to avoid the problem of trace data being overwritten in the trace buffer, as explained above.
To illustrate these problems further, it is difficult if not impossible to select a large number of trace events for which trace data is to be generated during operation of a software program in trace mode, and then to perform the software program in trace mode in a data storage system for a prolonged period of time (e.g., many hours or days) under heavy load conditions, without exhausting the initial capacity of the trace buffer thus causing the loss (e.g., the overwriting) of trace data. Accordingly, conventional approaches to accessing trace data in a data storage system provide very limited ability to perform long-term analysis of extended data storage system control program operation by collecting trace data generated for many different trace events during such an extended data storage system operation.
Further still, even if the problems of overwriting trace data and limited trace buffer capacity are not of a major concern for a software developer using conventional trace data access approaches, such conventional trace data access approaches provide limited access to the trace data by computer systems other than the data storage system (e.g., via a service processor computer system which is directly coupled and highly integrated into the data storage system) in which the trace data is generated. Using conventional trace data access approaches, the software developer must establish and configure trace events on the service processor, which is typically a console interface that is physically integrated into the data storage system. The service processor does not typically provide an interface for accessing the trace data in a remote manner.
The software developer must thus activate trace mode operation of the data storage system in trace mode while being physically present at the data storage system. Upon completion of the operation of the software program in trace mode, the software developer can then manually download or otherwise copy the trace data from the service processor in the data storage system to a removable media such as a disk which is then printed for analysis at a remote location. Other conventional trace data access alternatives include capturing a screen copy or xe2x80x9cdumpxe2x80x9d of the trace data in the trace buffer via the service processor, once the software program operating in trace mode has been halted.
Embodiments of the present invention significantly overcome these and other deficiencies associated with conventional data storage system trace data access techniques. In particular, embodiments of the invention provide mechanisms and techniques which allow for the continuous and substantially real-time capture of trace data during operation of a software program in trace mode in a data storage system without concern for the effects of trace data being overwritten by more recently generated trace data in the trace buffer. Using embodiments of the invention, a software developer need not be overly concerned about defining too many trace events which may occur to completely fill the trace buffer with trace data prior to being able to extract the trace data from the trace buffer. In other words, embodiments of the invention allow a software developer to define as many or as few trace events as necessary to properly test and analyze the operation of a software program in a data storage system without concern for conventional trace buffer and trace data access limitations.
Embodiments of the invention can also access trace data in a trace buffer in an automatic, real-time and dynamically adjustable manner such that if trace events begin to occur in rapid succession, the techniques of embodiments of the invention which operate to capture and access such trace data will keep pace with the more rapid creation of the trace data in a trace buffer. As will be explained, by providing a dynamically adjustable adaptive timing algorithm, if trace data begins to rapidly fill a trace buffer, embodiments of the invention can speed up trace data access to keep pace and extract the trace data at a rate which is substantially commensurate with the rate at which the trace data is placed into the trace buffer. Accordingly, if trace routines begin to overwrite trace data beyond the end of the trace buffer thus overwriting older trace data formerly written to the beginning of the trace buffer, embodiments of the invention operate to capture the older trace data prior to it being overwritten with newer trace data. In this manner, the system of the invention can allow a software program in a data storage system to operate indefinitely in trace mode while continually adapting and capturing trace data placed into the trace buffer. This allows trace data to be accurately captured for prolonged periods of operation of a software program in trace mode in a data storage system.
Embodiments of the invention also provide for the ability to remotely access (e.g., read and extract from the data storage system) the trace data without requiring a software developer to manually download or copy trace data from the service processor console on a data storage system. In particular, embodiments provide an event trace system call or routine implemented within a data storage system (e.g., implemented as an additional trace routine) that can be remotely activated and operated by a trace capture process performing (e.g., executing), for example, on a host computer system coupled to the data storage system. The event trace routine operates in the data storage system to access trace data in the trace buffer and can return the trace data to the trace capture process. The event trace routine is also able to return the current value of a trace buffer pointer. Using the event trace routine, the trace capture process can operate in a remote host computer system to either obtain the value of the current trace buffer pointer and/or to obtain trace data from the trace buffer.
An interface (e.g., system call interface providing parameters) to the event trace routine allows for a specification of a location at which to begin reading trace data from the trace buffer, as well as an amount of trace data that is to be read beginning at that location. If these parameter values are set to a predetermined value (e.g., are both set to 0), then the event trace routine returns the current value of the trace buffer pointer and no trace data is returned. Alternatively, if values are specified for a trace buffer pointer location and an amount of data to read, then the event trace routine returns the amount of trace data from the specified location.
Using this system call interface, a remotely operating trace capture process can detect advancement of the trace buffer pointer via call(s) to the event trace routine, and can then use the event trace routine access trace data placed into the trace buffer as a result of the advancement of the trace buffer pointer. Since the system call can be activated remotely by one or more host computer systems operating one or more trace capture process(es) configured according to embodiments of the invention, trace data can be extracted from the trace buffer and transmitted to the trace capture process(es) operating within the host computer system(s) that is/are distant or remotely located from the data storage system. The trace data can then be stored remotely for analysis of the performance of the software control program(s) that produced the trace data.
In particular, the system of the invention provides method embodiments which include a method for accessing trace data produced in a data storage system. The method comprises detecting availability of trace data in a trace buffer in a data storage system and in response to detecting, providing at least one request for the trace data in the trace buffer. The method then receives the trace data from the trace buffer in response to the at least one request and repeats the steps of detecting, providing at least one request and receiving the trace data such that trace data is continually accessed from the trace buffer. Using this method, embodiments of the invention are able to extract trace data when availability within the trace buffer is detected, thus preventing the problem of conventional trace data access systems which encounter the loss of trace data due to limited trace buffer capacity.
In one embodiment, the step of detecting availability of trace data in the trace buffer includes the steps of querying the data storage system to determine if trace data has been placed in the trace buffer, and if trace data has been placed in the trace buffer, proceeding to perform the steps of providing, receiving, and repeating, and if trace data has not been placed in the trace buffer, waiting a predetermined amount of time and repeating the step of querying. Such a query may retrieve trace buffer pointer information which can be compared with previous values of the trace buffer pointer to determine if the trace buffer pointer has moved, thus indicating the presence of additional trace data in the trace buffer. The predetermined amount of time to wait between such queries may be determined by an adaptive timing algorithm which can adjust the amount of time to wait based on factors such as the amount of trace data added to the trace buffer, a number of trace events for which trace data is produced, a speed of performance of a software program which produces the trace data, or other factors.
In one embodiment, the step of providing a request for a value of the trace buffer comprises the step of providing a call (e.g., a system call such as a remote procedure call or RPC) to an event trace operation in the data storage system. The event trace operation, which may be a routine embedded in the operating program or microcode of a data storage system, can return a value for the trace buffer pointer equal to a current trace buffer pointer position in the trace buffer in the data storage system.
In another embodiment, the step of providing at least one request for the trace data in the trace buffer comprises the steps of providing a call to an event trace operation in the data storage system. In this embodiment, the event trace operation receives at least one request for the trace data in the trace buffer, and in response to the at least one request, the event trace operation performs the steps of accessing the trace data from the trace buffer in the data storage system returning the trace data accessed from the trace buffer in response to the step of accessing.
As such, the event trace operation can either return just the trace buffer pointer value or, if requested, can return this value in addition to trace data read from the trace buffer.
In one embodiment, the general operation can be performed remotely for the data storage system such that the steps of detecting availability of trace data in a trace buffer, providing at least one request for the trace data, and receiving the trace data from the trace buffer are performed by a trace capture process operating in host computer system coupled to the data storage system, while the event trace operation is performed (e.g., via a remote system call from the trace capture process) in the data storage system to extract trace data from the data storage system that a processor in the data storage system places in the trace buffer in response to detecting a trace event.
To perform the step of providing at least one request for the trace data in the trace buffer, one embodiment comprises the steps of calculating an amount of trace data to be requested from the trace buffer based upon a value of a current trace buffer pointer associated with the trace buffer in the data storage system and a previous trace buffer pointer. Then to retrieve trace data, the request for the trace data received by the event trace operation indicates the amount of trace data calculated by the step of calculating such that the event trace operation accesses the trace data in the trace buffer according to the amount of trace data.
In the data storage system, the trace buffer in one embodiment is a circular trace buffer and the step of providing at least one request for the trace data in the trace buffer further comprises the steps of determining that an amount of trace data available in the trace buffer extends at least from an end of the trace buffer to a start of the trace buffer. In this instance, this embodiment can detect when trace data has completely filled the trace buffer and is currently being written (i.e., by a software program operating in trace mode) beginning at the start of the trace buffer so as to overwrite older trace data in the trace buffer. Accordingly, this embodiment provides a first request for a first portion of trace data from the trace buffer in which the first request specifies access to trace data from a previous trace buffer location to the end of the trace buffer, and then provides a second request for a second portion of trace data from the trace buffer. The second request specifies access to trace data from a start of the trace buffer to a current trace buffer location. In this manner, even if trace data is written beyond the end of the trace buffer, the system of the invention is able to detect this and is able to access all trace data in the trace buffer.
In one embodiment, the operation of the trace capture process can also establish trace events for which the data storage system is to generate trace data in the trace buffer and can activate event tracing to cause the data storage system to begin detecting trace events for which trace data is generated and placed in the trace buffer. The trace capture process can also store the trace data in a trace database in response to the step of receiving the trace data.
Other method embodiments of the invention operate within a data storage system. In particular, one such method embodiment provides trace data to a host computer system by detecting a trace event and in response to detecting the trace event, placing trace data associated with the trace event in a trace buffer. The method in a data storage system also receives at least one request for the trace data from a remote computer system and in response to the at least one request, forwards the trace data from the trace buffer to the remote computer system. The method also consecutively performs the steps of detecting, placing, receiving and forwarding at a rate such that trace data placed into the trace buffer is forwarded to the remote computer system. A similar embodiment of the invention performs these steps or operations completely within a data storage system such that the request in response for trace data occurs within the data storage system instead of with (i.e., to or from) the remote computer system.
In other method embodiments within a data storage system, the request is a call to activate an event trace routine in the data storage system. The event trace routine performs the operation of receiving the request for the trace data. The request includes a trace buffer read position and an amount of trace data to read from the trace buffer. The event trace routine determines if trace data requested is for a value of a current trace buffer pointer, and the trace routine obtains the current trace buffer pointer value and returns the value of the current trace buffer pointer and the trace data. Alternatively, if the request includes a specification of trace data to read from the trace buffer (for example, by indicating an amount of trace data to read by indicating a location and an amount in which to read trace data from the trace buffer), then the event trace routine reads trace data from the trace buffer, beginning at a trace buffer read position specified in the request, until an amount of trace data as specified in the request is read from the trace buffer and returns the current trace buffer pointer and the trace data read from the trace buffer.
Other embodiments of the invention include a computerized device, such as a host computer system, configured to access trace data according to the trace capture process operations disclosed herein as embodiments of the invention. In such embodiments, the computerized device includes at least one interface, such as a host interface, coupled to a data storage system, a processor, a memory encoded with a trace capture application, and an interconnection mechanism coupling the processor, the at least one interface, and the memory. In embodiments of the computerized device, the processor performs the trace capture application in the memory to provide a trace capture process, that when performed, causes the computerized device to access trace data according to the method embodiments of the invention.
Other embodiments of the invention include data storage systems equipped to perform the method operations disclosed herein as embodiments of the invention. That is, embodiments of the invention include a data storage system equipped with either an event trace routine which operates as explained herein, or alternatively, a data storage system equipped with both an event trace routine operating as explained herein in addition to a trace capture process which also operates (e.g., executes) within the data storage system in which works in conjunction with the event trace routine the capture trace data.
In particular, in one embodiment, a data storage system is provided which includes at least one interface, a cache memory encoded with a trace buffer, at least one processor operating a software program in trace mode, and an interconnection mechanism coupling the at least one interface, the cache memory and the at least one processor. In such embodiments of a data storage system, the processor(s) performs at least one trace routine including an event trace routine to cause the data storage system to perform according to the event trace routine operations explained herein as embodiments of the invention.
Other arrangements of the invention that are disclosed herein include software programs to perform the method embodiment operations summarized above and disclosed in detail below. More particularly, a computer program product is disclosed which has a computer-readable medium including computer program logic encoded thereon to provide access to trace data. The computer program logic, when executed on at least one processor with a computing system, causes the processor to perform the operations (e.g., the methods) indicated herein as embodiments of the invention. Such arrangements of the invention are typically provided as software, code or other data on a computer readable medium such as an optical medium (e.g., CD-ROM), floppy or hard disk or other a medium such as firmware or microcode in one or more ROM or RAM or PROM chips or as an Application Specific Integrated Circuit (ASIC). The software or firmware or other such configurations can be installed onto a computer system to cause the computer system to perform the techniques explained herein as embodiments of the invention.
It is to be understood that the system of the invention can be embodied strictly as a software program, as software and hardware, or as hardware alone. Example embodiments of the invention may be implemented within EMC""s Symmetrix line of data storage systems and software manufactured by EMC Corporation of Hopkinton, Mass., USA.