While client database platforms (i.e., home and business desktop computers) use hardware of a quality that is much lower than on server platforms, even server-class hardware (controllers, drivers, disks, and so forth) can cause “physical” data corruption such that a read operation does not return what the database application wrote to the data store. Of course, this is clearly a more prolific problem with client database platforms (as opposed to server database platforms) for various reasons including without limitation the increased probability of a client machine been arbitrarily powered off in the midst of a write operation due to an unexpected power outage (which in turn leads to torn pages and potential database corruptions) whereas it is more common for server database systems to utilize uninterruptible power supplies to mitigate problems from power outages. Media decay is another source of “physical” data corruptions, where the physical storage media quite literally wears out over time. And yet another source of concern regarding reliability is the detection and recovery from “logical” corruptions caused by software errors whether inadvertent (e.g., bugs) or pernicious (e.g., viruses).
Traditionally maintenance and repair of a databases (and database file systems) has fallen to database managers and the like having a well-developed skill set and deep knowledge of database systems, or at least to individuals who are familiar with and regularly use database systems—by and large persons relatively skilled with regard to database technologies. On the other hand, typical consumer and business end-users of operating systems and application programs rarely work with databases and are largely ill-equipped to deal with database maintenance and repair issues.
While the disparate level of skill between these two groups has been largely irrelevant in the past, a database-implemented file system for an hardware/software interface system—such as the hardware/software interface system disclosed in the Related Patent Applications—creates a scenario where these lesser-skilled end-users will be faced with database maintenance and repair issues they will largely be unable to resolve. Thus a business/consumer database-implemented operating system file system, or “database file system” (DBFS) for short, must be able to detect corruptions and recover its databases to a transactionally consistent state and, in the cases of unrecoverable data loss, the DBFS must then guarantee logical data consistency at the level atomic change units to said data are maintained (i.e., at the “item” level for an item-based DBFS). Moreover, for DBFSs running by default in a lazy commit mode, the durability of transactions committed just before an abnormal shutdown is not guaranteed and must be accounted for and corrected.
Moreover, while business/consumer end-users will greatly benefit from automating DBFS maintenance and recovery, database managers and those of greater database skills will also benefit from a technical solution for general database maintenance and repair. It is commonplace in the art for database administrators to utilize database tools (for example, the database tuning advisor provided with SQL Server 2000), but these tools do not directly address reliability but instead provide a means by which backups of the database are administered and managed—and not in a mostly-automated fashion, but instead requiring substantial database administrator involvement, particularly when database backups are not available or other repair issues arise. Thus an automated solution to address database reliability would also be beneficial for database administrators and other skilled database users, and the invention described in the Parent Patent Application provides one overarching solution.
Various embodiments of the invention of the Parent Patent Application are directed to a data reliability system (DRS) for a DBFS wherein the DRS comprises a framework and a set of policies for performing database administration (DBA) tasks automatically and with little or no direct involvement by an end-user (and thus is essentially transparent to said end-user). For several embodiments, the DRS framework implements mechanisms for plugging error and event notifications, policies, and error/event handling algorithms into the DRS. More particularly, for these embodiments DRS is a background thread that is in charge of maintaining and repairing the DBFS in the background, and thus at the highest level the DRS guards and maintains the overall health of the DBFS. For certain embodiments, the DRS comprises the following features with regard to physical data corruption: (1) responding and correcting data corruptions at a page level for all page types; and (2) attempting a second level of recovery (rebuild or restore) for index page corruptions (clustered and non-clustered), data page corruptions, and page corruptions in the log file. Thus, for certain embodiments, the DRS comprising functionality for: (i) handling repair/restore data corruption cases; (ii) improving the reliability and availability of the system; and (iii) keeping a DRS error/event history table for a skilled third party to troubleshoot database or storage engine problems if necessary.
While the foregoing embodiments described and claimed in the Parent Patent Application largely address physical data corruption (i.e., correcting corrupted data in a database stored on the physical storage medium), a robust DRS should also address logical data corruptions to entities (e.g., items, extensions, and/or relationships) representatively stored in the data store in order to ensure that all such entities in said data store are both consistent and conform to the data model rules.