An application database may use data in groups or chunks, in some conventions called a page, in a particular minimum quantity that represents a viable logical object within the database. Tasks of a database include management and storage of large quantities of data, accessing and writing of data in appropriately fast times, and maintaining data integrity. A database generally includes primary and secondary storage sites for respectively storing main or original data, and replicated data on a remote mirror site. The replicated and stored mirror data can be accessed for backup and disaster recovery to enable fast and accurate system recovery. Once redundant operations are established by way of block-level disk array replication, the database has the benefit of duplicate data copies. During normal operations, primary volumes remain online to all hosts and process both read and write input/output (I/O) operations. In the event of a disaster or system failure, a secondary data copy can be quickly invoked to enable recovery with high data integrity.
When the system recovers data from the remote disk array mirror site, data base integrity is only ensured when data pages are fully present. Otherwise, the database cannot successfully start. Generally, a database starts by generating a call to open the database, scanning the storage media, allocating physical memory and executable processes to handle I/O operations, and opening control files that contain information coordinating data and log files and indicating file status. The startup process also includes opening the files identified by the control files. If some files are incomplete, linkages between files are missing and the database is unable to start. Incomplete transfer of a replicated page to the mirror site can result in a partial transfer of a page, termed a “torn page.” Some databases may fail to handle the torn page phenomenon and attempt to use the incomplete page at startup, possibly resulting in an abortive startup or data corruption.
Referring to FIG. 1, a schematic block diagram illustrates a torn application page phenomenon and handling by a server, such as Microsoft™ SQL Server, that may result in faulty startup or data corruption. In an example of a common transaction, a host application 102 executing in a primary site 100 performs a large write operation, for example an 80 KByte write, using an 8 K/1 K file system 104. For a storage that is used in a virtual mode, the 8 K/1 K file system 104 performs any write operation larger than 8 KBytes by dividing the data into individual 8 KByte blocks, transparent to the host application 102, and transferring the blocks separately. If fewer than 8 Kbytes remain after the division, the 8 K/1 K file system 104 sends the remaining data in 1 KByte blocks. Alternatively, the storage can be used in a physical or raw mode. In virtual mode operations, the 8 K/1 K file system 104 handles a large application write which represents a single application page by transparently breaking the page into multiple fragments. In a relatively unsophisticated server, the receiving storage array has no way to determine or identify whether all pages have been received in a transfer, making the system susceptible to the problem of a torn page at a receiving array for any write that is larger than the file system block size.
In the illustrative example, the 8 K/1 K file system 104 divides the 80 KByte write into ten individual 8 KByte write operations and transfers the ten 8 KByte blocks into a Primary Storage 106. A mirroring link 108 transfers the individual blocks to a remote storage array or storage mirror 110 for replicated storage. A sophisticated application, for example a Veritas™ file system or Oracle™ database can use a combination of an intent log, the data write, and a commit signal to resolve the torn page problem. For example, a file system can use the intent log, a circular activity log containing records of intention of the system to update a structure, to ensure integrity. The intent log records pending changes to file system structure and ensures log records are written in advance of changes to the system. In the case of system failure, pending changes in the file system are either nullified or completed. Normally, the intent file records only changes to the file system structure and not file data changes. A less sophisticated application such as the Microsoft™ SQL Server simply sends the application data page and assumes or hopes that the page is intact at the remote array 110 at the time of a failover.
As shown in FIG. 1, an 80 KByte application page disk write may be transparently divided into ten 8 KByte fragments by the 8 K/1 K file system 104 written to the primary storage 106 and mirrored to arrays in the remote storage 110. If a disaster occurs at the primary site 100 before the remote array 110 receives and destages to disk 112 all ten of the fragments, then an error may result on application startup of the remote array 110. For example, even if nine of the ten fragment writes that make up the application page had been correctly received and destaged to disk 112, the remote array 110 would contain an incomplete or torn application page. An unsophisticated application may designate the torn page as a corrupted and unusable database and refuse startup, possibly nullifying an implemented disaster recovery response or plan.
In a particular example, a Microsoft SQL Server™ used with various Windows™ operating systems use data pages with inconsistent sizes. As a result, a corrupted database is a possible consequence of a power failure, disk driver or physical disk difficulties, or other disaster. Corruption can occur because each time the operating system writes an 8 KByte SQL Server data page, the page is broken into 512 byte pages. After the first 512 bytes of data are written, SQL Server assumes the entire 8 Kbytes have written to disk successfully. If power or other failure occurs before all 512 byte pages are written to disk, the SQL Server cannot detect the failure. The condition is the described “torn page.” Corruption of a single data page, due to lack of information narrowing the extent of corruption, renders the entire database corrupt. SQL Server attempts to limit resulting damage through usage of “torn page detection” that can be enabled to identify torn pages. Torn page detection does not prevent corruption but only marks the database as corrupt, so that the database can be restored with the latest backup, unfortunately resulting in downtime and productivity loss for lost data generated since the last backup.