A computer file that records actions that have occurred is commonly referred to as a log file. A log file represents a durable and consistent recording of all the work that was done during a single continuous time interval. Typically, when an action that is to be logged occurs, a log implementation appends or otherwise writes a record to the log file. Log files may be text files, binary files, or data files (stateful or stateless). For example, servers typically maintain log files listing requests made to the server. Web servers maintain log files containing a page request history. Log clients, on the other hand, query the log file to acquire information regarding prior actions recorded in the log file. Examples of log clients include log analysis software, file analysis tools, transaction managers, queue managers, and databases.
Unfortunately, log files can experience “amnesia”, which occurs when one or more records of the log file are undesirably lost. The log implementation experiences “amnesia” because the log implementation is not often initially aware that records have been lost, at least until the log implementation detects this amnesia. The detection of amnesia is a complex problem even when operating on a simple log. However, as will now be described, some logs are even more complicated, thereby rendering the detection of amnesia as being quite a difficult problem indeed.
One type of log is called an “infinite” log. In an infinite log, once a record is written to the log file, the record is available for all time. Of course, an infinite log can grow larger, where the pace of growth depends on the pace at which records are written to the log, and depending on the size of the records. In many implementations, the infinite log may even grow to an unwieldy size well before time renders the log as irrelevant.
A “circular” log is a log that somewhat simulates an infinite log without the same growth problems inherent in an infinite log. A circular log is a log where certain records are removed from the log in order to save storage space. A circular log implementation uses a protocol called a “checkpoint” protocol to allow this to occur safely.
A typical checkpoint protocol executes a checkpoint in response to a particular trigger. Conventional checkpoint triggers include, for example, 1) free space available in the log dropping below a threshold level, 2) the expiration of a timer, and 3) the detection of a user request to execute a checkpoint. In response to a checkpoint trigger, the log implementation contacts each log client that uses the circular log, and requests that each log client identify the earliest record that that log client is interested in. For each log client, that earliest record is called a “low water mark”. The log implementation then calculates a “global low water mark” determining the earliest “low water marks” of all of the log clients. The log implementation then records the global low water mark, and the space previously used for record identifiers (often referred to as “Log Sequence Numbers” or “LSNs”) earlier than the global low water mark is released, thereby freeing up storage. As this checkpoint process repeats, the global low water mark advances and deletes aging records.