The invention is generally directed to optimizing electronic storage devices and systems and, more particularly, to a method and apparatus for measuring and optimizing spatial structures of electronic storage workloads.
In modern computers and related devices and systems, the ever increasing flow of data demands efficient non-volatile storage devices used to store and access data. Many different entities within a computer as well as those external to the computer increasingly demand more access to data. Within most computers, several different devices exist to accommodate these demands. For example, Random Access Memory (RAM) and flash memory are used for fast and efficient access to data, but are limited in storage space. They are typically used for temporary storage of data. Larger devices such as tape drives are used to store larger amounts of data, and have relatively slower access. Storage devices such as disk drives are used most prominently for storing large amounts of data in computer databases as well as other devices. The data access rate of disk drives can vary widely depending on the access pattern and the data organization. One reason for this is that access characteristics can vary greatly among devices and different applications. Thus, proper storage system planning is required in order to allocate space to and to optimize the use of a computer""s overall memory and storage devices.
Storage planning involves the assessment of data storage activity within a computer or system. This planning is greatly affected by the different types of access activity of applications that may be running on a system. For example, some applications perform a large proportion of read operations, such as data mining. Other applications perform a large amount of write operations, such as transactional processing. Some applications perform I/O (input/output) operations in small amounts, such as email messages, where others use large ones, such as in databases. Some applications perform I/O operations that are very sequential in nature, again, like data mining, where others are random, such as email and transactional operations. Some applications access storage space frequently, such as transactional processing and email indexing, where others access storage space rarely, such as email data for old or archived messages. Some applications are bursty, where data is accessed at widely varying rates, for example, many accesses followed by periods of time with few accesses. Other applications access data in a non-sequential manner. Still others may access data in a continuous manner, where data is accessed in steady amounts and at a steady pace. Other access characteristics can exist as well, further complicating storage management, such as the spatial locality of the data access.
For mass produced computers, access characteristics may be measured by defining a single logical volume, wherein the logical volume is the storage area available on the disk. Within this area, access characteristics exist that may vary according to the particular applications that utilize particular portions, such as blocks or partitions, of the area. These characteristics may include data access rates, access patterns, burstiness of the access, locality of the access, and other characteristics. The access characteristics of the area may be monitored using conventional techniques. This information may be used to plan the memory and storage devices in a computer or system. However, difficulties in planning arise as a result of the predetermined nature of the block divisions or partitions.
Within these blocks, access characteristics can vary widely. Access characteristics, such as those discussed above, can differ greatly depending on which application or applications access the particular portions of the storage area. In conventional methods of monitoring the space, the data defining the varying characteristics are typically averaged over the entire storage space, diminishing the effectiveness of monitoring the space. For example, typical file systems within a personal computer contain both frequently used files, such as indices, and less frequently used files, such as email data and older messages. In an ideal computer system, the more frequently used files would be best stored in a fast device. Similarly, the less frequently used files could be stored in a less expensive, slower device. This can be true of other devices and systems. Without the capability to identify areas of such disparate access, the goal of judicious use of storage space is not fully realized.
Thus, it would be useful to provide a new method and apparatus for more intelligently measuring and optimizing the spatial structure of electronic storage devices. This would improve the management of storage use, thus improving the overall performance of such devices. As will be seen, the invention does this in an elegant manner.
A method and apparatus is provided for measuring and optimizing the orientation of data access of an electronic storage device according to data access characteristics. These characteristics may be derived from a trace taken of the access activity of the storage space. The method includes monitoring storage access activity in an area of storage space and gathering data pertaining to one or more storage access characteristics. The method further includes measuring characteristics of the storage access activity of at least two individual portions of the storage space. These activities may be measured according to a predetermined parameter. Also, these measurements may or may not have temporal, spatial and other quantified factors.
Using this history of access activity, more judicious use of storage space may be obtained where conventional methods have failed. Depending on their homogeneity of access characteristics, the individual portions may then be left alone, merged with other similar portions, or further subdivided into sub-portions, which may be further merged, divided or left alone. At each merger or division, determinations can then be made of whether the characteristics of storage access activity of one individual portion or sub-portion are similar to that of another portion according to predetermined criteria. If the characteristics are similar, the two portions may be merged into a single portion. If the characteristic is not similar, it can then be compared to that of another portion. This process may then continue until the data access factors are optimized according to predetermined criteria.