The present invention relates to a method of, and apparatus for, building a virtual representation of the performance of a data storage resource forming part of a networked electronic data store.
Traditionally, electronic data is stored locally on a user's computer system by means of a data storage resource such as a hard disk drive (HDD) or other storage media. However, the increasing prevalence of data-heavy resources (for example, real-time high definition video) has led to an increased demand for storage capacity.
An increasingly popular area is what is known as “cloud computing”. Cloud computing provides a set of scalable and often virtual resources over a network such as an Ethernet or the Internet. A “cloud” comprises a consolidated storage system having large storage capacity (typically at the multi-petabyte level) which may serve independent customers (e.g. the cloud acts a storage service provider) or business units within an organisation (e.g. the cloud acts as a common corporate data store). In essence, cloud architecture means that the users generally do not own the physical computing resources they use and, instead, purchase usage from a third-party provider in a service-orientated architecture, or access a common corporate data store.
“Cloud”-type storage service providers are attractive to small to medium sized enterprises which do not typically have the resources to invest in over-provisioned storage infrastructures which will never be used efficiently. Storage service providers offer such users access to the storage services that they require without the need for capital expenditure on hardware and software solutions. In addition, the cost of hardware is becoming increasingly small in comparison to the cost of maintaining and managing a data storage resource. Therefore, this makes the “cloud” approach even more attractive to businesses. In many cases, service providers provide services in the manner of a utility service and billed, for example, on the basis of the resources consumed by the user or on a periodical billing basis.
It is known for the provision of services by a service provider to be covered by service level agreements (SLAs). An SLA is a negotiated agreement between a service provider offering a service and a client requiring use of the service. The SLA records a common agreement regarding the quality of service (QoS) to be delivered to the client. For example, in the field of data storage provision, the QoS may relate to minimum levels of (for example) performance, reliability, storage capacity, data bandwidth or read/write latency which can be guaranteed by the service provider. These factors form part of the QoS guaranteed to the client as part of an SLA. Therefore, when a user service provider enters into an SLA with a client, it is important that the service provider has the resources necessary to provide the specified level or type of QoS forming part of that SLA, i.e. that the service provider can meet the standards of service demanded by the client as defined in the SLA.
Currently, requests for access to a data storage resource are accepted without any knowledge of the current status or capabilities of the storage system. However, the performance of a given data storage resource is heavily dependent upon the demands placed upon it. For example, if a number of users are using a large proportion of bandwidth of the data storage resource (possibly in excess of that agreed for their respective SLAs), then the service provider may not be able to meet the required QoS for the new SLA.
Typically, because real-time data relating to the data storage resource is not available, the only way to circumvent this problem is to heavily over-provision the data storage resource, i.e. to have sufficient available capability to ensure that the QoS standards are met. However, this approach is wasteful of resources and uneconomical because a significant proportion of the data storage resource must be kept free for use during abnormally heavy traffic conditions, and so is rarely used. Consequently, existing service-orientated storage providers can only guard against “worst case” scenarios of abnormally heavy load.
Therefore, known storage provision arrangements suffer from a technical problem that current and predicted storage resource information cannot be easily obtained. This means that real-time conditional QoS guarantees on storage resource access cannot be made.
The paper “CHAMELEON: a self-evolving, fully-adaptive resource arbitrator for storage systems”, S. Uttamchandani et al, USENIX Technical Conference, Anaheim, Calif., (April, 2005) and “QoS Support for Intelligent Storage Devices”, J. C. Wu and S. A. Brandt, Computer Science Department, University of California, Santa Cruz disclose an alternative approach, in which connections which are exceeding their agreed SLA are rate-limited or “throttled”. Consequently, this ensures a more fair distribution of resources between user connections so that the QoS is deferred equally amongst the connections. However, data storage resources are increasingly used for data transfers that require a minimum constant data bandwidth, such as the streaming of video content. The throttling of connections equally in time of high demand may not be appropriate in these circumstances.
A similar approach, albeit from a QoS perspective, is disclosed in “Polus: Growing storage QoS management beyond “A four-year old kid””, S. Uttamchandani et al, USENIX Conference on File and Storage Technologies (FAST '04), 2004. This document discloses an arrangement whereby the need for system administrators to write code that maps QoS goals to low level system actions within a storage area network (SAN).
Alternatively, hard drives can be modeled at the internal component level as disclosed in “An introduction to disk drive modeling”, C. Ruemmler and J. Wilkes, IEEE Computer 27 (3): 17-29, March 1994. However, such modeling is heavily resource-intensive and cannot be performed in real-time.
It is currently difficult to model the real-time current and future storage performance of a data store in order to address the technical limitations of current systems for managing storage. Modeling of a data store (which consists of multiple connections accessing data across a set of many drives all of which may be in different states) in order to provide real-time and QoS is extremely processor-intensive and requires significant computing power. This means that such an intensive modeling process cannot be carried out efficiently in real-time.