Generally described, most corporate and governmental entities utilize computer systems, computer networks, and integrated devices that generate vast amounts of electronic data. In some cases, data is generated, processed, and discarded after serving an intended purpose. In other cases, corporate, governmental, or other entities require that data be stored and maintained for future use. Such storage methods and systems may be generically referred to as “archival.” Accordingly, many data generating users look for storage solutions that correspond to the type and amount of data to be archived.
A growing category of archival is known as data warehousing. Data warehousing refers to the various activities involved in the acquisition, management, and aggregation of data from various sources into a centralized repository, such as a database. The database may be hosted by one or more servers, at least some of which may be physically proximate. Additionally, the central data warehouse may be a virtualized central repository in which a number of distributed servers pool and share data. In a typical application, a data warehouse stores time-oriented data that may be gathered from disparate sources. Data warehousing may be distinguished from the broader category of data archival in that the data warehouse maintains the stored data in a static manner. Because warehoused data cannot be modified (only added to or deleted) it may be used for analysis over time or by type. The data warehouse may also include metadata used to organize and characterize the data. In addition to the ability to store and retrieve data, many database storage solutions also include some type of data restoration process or system that enable data recovery in the event of a hardware and/or software failure. This is generally referred to as storage recovery. One embodiment for storage recovery relates to “mirrored” storage solutions in which one or more identical, redundant data repositories are maintained to replicate, or mirror, the archived data contained in a primary repository. In the event some or all of the data is lost from the primary repository, one or more complete copies of the data exists in the mirrored storage repositories. Accordingly, mirrored storage solutions provide for data recovery in the event that the primary storage repository fails to replicate previously stored data. However, once the primary storage repository fails, the data warehouse cannot continue to collect new data. Accordingly, any new data transmitted to the data warehouse would be lost, or the data processing system may have to shut down.
One attempt to provide additional data warehouse fault tolerance, referred to generally as failover support, relates to the use of a clustered database to transfer data to an alternate collection point in the event of a primary repository failure. In accordance with this embodiment, a database is installed across two or more servers that are linked together, such that each server in the clustered database is logically viewed as a node on the network. To provide for true failover support, the server nodes do not share processing resources. Environments in which storage and processing resources are not shared between nodes are generally referred to as “shared nothing” architectures. Shared-nothing environments are better suited to large, complex databases supporting unpredictable queries, as in data warehousing. Although a shared-nothing environment potentially allows for continued data collection in the event of a failure, the costs involved in providing and maintaining multiple servers for storage redundancy are prohibitive for many potential users. Accordingly, a clustered database approach may not present an affordable solution for many data warehouse applications.
Therefore, there is a need for a resource-efficient, fault-tolerant solution for data warehousing that will provide continuity of the data warehouse function in the event of a network, hardware, or software failure.