Computer operating systems (OS) employ file systems to map the complexity of physical or virtual storage hardware to logical abstractions that can be easily manipulated. File systems are part of the storage stack of modern systems, and may be implemented as kernel services, user services, firmware, network services, virtualized services, and more, as well as combinations thereof. Modern file systems use directories (a.k.a. folders) and directory entries to keep track of the file names on a file system as stored within diverse storage media, including magnetic hard drives, Flash memory drives and other solid-state devices, floppies, tapes, or optical media such as compact disks, DVDs, Blu-ray, and the like. In such file systems, the directory entry for a file typically points to a list of blocks that contain the file's data. The exact format of the directory entry and block list varies depending on the specific type of file system (e.g., Linux ext2, FAT32, HFS+, NTFS, or UDF), but this general approach is widely used because it is simple and provides access to files and their contents with a minimum of overhead.
As used herein, “File Allocation Table” or “FAT” (e.g., FAT, FAT16, FAT32, exFAT, or the like) is a file system designed by Microsoft Corporation that uses index tables (i.e. file allocation tables) that contain entries for each cluster or unit of disk space allocation for files and directories.
As used herein, “second extended filesystem” or “Linux ext2” is a file system for the Linux operating-system kernel.
As used herein, “fast file system” or “FFS” is a file system for the Berkeley Software Design (BSD) operating-system kernels.
As used herein, “Unix file system” or “UFS” is a file system for the BSD and Solaris operating-system kernels.
As used herein, “ZFS” is a file system for the Solaris (now Oracle) operating-system kernels.
As used herein, “Hierarchical File System Plus” or “HFS+” is a file system developed by Apple Inc. as the primary file system used in Macintosh computers or other systems running Mac OS. It is also one of the formats used by the iPod digital music player.
As used herein, “New Technology File System” or “NTFS” is a proprietary file system developed by Microsoft Corporation for its Windows line of operating systems.
As used herein “Universal Disk Format” or “UDF” is an open vendor-neutral file system for computer data storage for a broad range of media including DVDs and new optical disc formats.
As used herein, a “kernel” is the main component of a computer operating system that bridges between application and the actual data processing done at the hardware level.
As used herein, “metadata” is data and information used to describe files including directory information, registry information, namespace information, superblocks, cluster groups (e.g., FAT cluster information), inodes, inode and block bitmaps, journals, and the like.
As used herein, “erasing” (also referred to herein as “deleting” or “conventional erasing” or “conventional deleting”) refers to the conventional process that causes transitions from normal operating-system-mediated data and program availability, to the loss of availability and gain of related storage space. In some operating systems, such as Unix, conventional erasing is referred to as “unlinking”.
As used herein, “destructive conventional erasing” (also referred to herein as “destructive conventional deleting”) refers to overwriting a file's data blocks one or more times with a known patterns, such as all ones or all zeroes, or random data, ensuring that the contents cannot be recovered.
As used herein, “dematerialize” means to render a file inaccessible, but not destroy the data blocks of the file, nor release the data blocks associated with the file for reallocation. For example, in some embodiments, dematerializing a file includes modifying metadata associated with a file such that the file cannot be reconstituted by reading the file's metadata and such that the file's data blocks are marked as unavailable (sometimes referred to herein as “occupied”).
As used herein, “materialize” is the reverse of dematerialize—that is, “materialize” means to make a file reappear precisely as the file appeared prior to dematerialization. Therefore, as used herein, dematerialization is considered to be a reversible process.
As used herein, an “irreversible dematerialization” is a process which renders a file inaccessible and releases the data blocks associated with the file for reallocation.
As used herein, “mounting” refers to making a storage medium (e.g., any non-volatile or volatile, read-write or read-only storage medium such as magnetic hard drives, floppies, CDs and DVDs, Flash, RAM, networks and cloud servers, tapes, Shingled devices, Phase Change Memory devices, and the like) operatively coupled to a computer accessible through the computer's file system. A Microsoft Windows operating system generally automatically mounts any storage medium that is attached to a computer running the Windows operating system. Similarly, OS X (Apple Inc.), Linux, and others may also automatically mount media that becomes available to their respective operating systems
Often, it is necessary to conventionally delete files from a file system for various reasons, including the need to free up space they are using, the need to replace the file with a more recent version, and the need to remove the file so that its data will no longer be accessible to users of the file system. In order to conventionally delete a file, most file systems accomplish at least two tasks: marking the file's directory entry as “unused,” and making the data blocks that the file was using available to subsequently created files. For some file systems, additional information may also be marked as unused or freed, such as inodes, block bitmaps, indirect data blocks, and more.
If the goal of conventionally deleting the file is to ensure that nobody can ever recover the data contained in the file, file systems perform a destructive conventional erase that overwrites the file's data blocks one or more times with a known pattern such as all ones, all zeroes, random data, a combination thereof, or the like, ensuring that the contents cannot be recovered. While this approach is very secure, it is also very slow. For example, a destructive conventional erase of all of the files on a terabyte hard drive could require many hours to overwrite all the data.
Instead, many modern file systems take a much simpler, but less secure, approach: they mark directory entries as “unused” and leave most of the other data on the disk untouched. This approach sets a status flag in the directory entry, changing a single word or other small amount of information on disk, and writes the directory entry back to disk. At this point, the file is considered conventionally deleted from the point of view of the file system and the directory entry is available for reuse for future files that might be created or written, but the entry is largely unchanged otherwise.
In conventional systems, after marking the directory entry as “unused,” the file system makes the blocks that the file was using available for use by other files. This can be done in several ways, the most common of which is a bitmap or a free list. In file systems such as Linux ext2, a bitmap record uses a single bit for each allocation unit (an allocation unit consists of one or more blocks) in the file system, with one value (1, for example) indicating that the corresponding space is free, and the other value (0) indicating that the corresponding space is incorporated into a file and thus unavailable for use. In such a system, the file system frees the space associated with a file by setting the bits associated with the space to 1. This marking is arbitrary but consistent within a file system; NTFS uses the reverse convention. In file systems such as Ext4 (fourth extended filesystem, a journaling file system for Linux), XFS (a high-performance journaling file system created by Silicon Graphics, Inc.), BTRFS (B-tree file system, a General Public License (GPL) experimental copy-on-write file system for Linux), and others, an extent (e.g., start+end block#) is used rather than a bitmap.
No other activity is typically necessary for conventional erasing; thus, file systems concerned with efficiency do not destroy the structures in the blocks themselves that describe the relationship of the blocks to the now-conventionally deleted file. A major drawback of a conventional delete (in situations where it is desired to prevent recovery of the conventionally deleted file) is that it is relatively straightforward to recover a file that has been conventionally deleted if no other files have reused the directory entry or media blocks (i.e., there is a window of opportunity to recover a file fully after it has been conventionally deleted; this window closes when and if the directory and/or data blocks of the file have been recycled). In file systems such as UDF, a list of blocks that are available is maintained (UDF actually uses extents—ranges of blocks—rather than individual block numbers, but the approach is the same). The identifiers for blocks that were used in the now-conventionally deleted file are added to the list of blocks available for reuse without necessarily altering the data within the blocks themselves. Not changing block content makes it straightforward to recover the file and its contents using the flagged directory entry and associated (unmodified) block pointers, as long as the data blocks have not been reallocated to another file.
Another problem associated with conventional erasing is that conventional erasing is done via standard operating system commands (often called “system calls”), which make the process inefficiently slow. Conventional erasing is slow because it acts on only one file at a time with independent operating system commands that have long setup latencies. When the software loaded is valuable and owners are concerned about potential piracy, conventional erase is not adequate.
U.S. Pat. No. 7,565,695 to Michael Burtscher (hereinafter, “Burtscher”), titled “SYSTEM AND METHOD FOR DIRECTLY ACCESSING DATA FROM A DATA STORAGE MEDIUM” issued Jul. 21, 2009, and is incorporated herein by reference. Burtscher describes systems and methods for scanning files for pestware on a protected computer. In one variation, locations of each of a plurality of files in a file storage device of the protected computer are identified while substantially circumventing an operating system of the protected computer. Information from each of the plurality of files is retrieved and analyzed so as to determine whether any of the plurality of files are potential pestware files. In variations, the operating system is circumvented while the information from each of the plurality of files is retrieved. In other variations, before information is retrieved from each of the plurality of files, a listing of the plurality of files is sorted according to the locations of the files on the storage device so as to reduce, even further, the time required to access the plurality of files.
U.S. Patent Application Publication 2006/0277183 to Tony Nichols et al. (hereinafter, “Nichols et al.”), titled “SYSTEM AND METHOD FOR NEUTRALIZING LOCKED PESTWARE FILES” published Dec. 7, 2006, and is incorporated herein by reference. Nichols et al. describe systems and methods for scanning and deleting pestware on a protected computer. In one variation, the presence of a pestware file on the storage device is detected while an operating system of the protected computer is limiting access to the pestware file via the operating system. In order to mitigate any undesirable consequences the pestware might cause, a listing of a plurality of pointers to data for the pestware file is altered while the operating system continues to limit access to the file via the operating system. In this way, the operating system will be unable to locate and launch the pestware file. In systems where the files are organized in an NTFS format, a master file table (MFT) bitmap may be removed as well.
U.S. Pat. No. 5,794,052 to Henry N. Harding (hereinafter, “Harding”), titled “METHOD OF SOFTWARE INSTALLATION AND SETUP” issued Aug. 11, 1998, and is incorporated herein by reference. Harding describes a method for reducing the time needed for setting up a computer system in a user selected language version of a disk operating system by pre-installing a plurality of modules for different language versions of disk operating systems. Upon initial power on by an end user, a minimal disk operating system runs a software setup program which installs the end user selected language version of the disk operating system and merges certain factory loaded files into the user selected language operating system. A software installation program is then run which implements the changes necessitated by each of the modules for proper operation thereby resulting in a disk operating system that is properly configured for the operation of the combination of software programs. The computer system is then re-booted to implement the changes to the configuration of the disk operating system.
U.S. Pat. No. 6,681,391 to Phillip J. Marino et al. (hereinafter, “Marino et al.”), titled “METHOD AND SYSTEM FOR INSTALLING SOFTWARE ON A COMPUTER SYSTEM” issued Jan. 20, 2004, and is incorporated herein by reference. Marino et al. describe a method and system for installing software on a computer that generates an installation order that ensures that a component required for the functioning of another component is already installed. Furthermore, it makes possible generating good installation orders to allow related components, e.g., in a software suite, to be installed close together, thus reducing disk swapping. The method and system take into account the existing configuration on a computer and allow removal of components along with dynamic reconfiguration of a computing system in response to a user's choice of an application program to launch. In accordance with the invention, preferably a developer includes information about the component's relationship with other components, e.g., a specific requirement for a preinstalled component or a requirement that a particular component not be present, thus requiring its removal. To remove the possibility of a single identifier referring to more than one component, the preferred embodiments of the Marino et al. invention use globally unique identifiers to label individual components.
U.S. Pat. No. 7,143,067 to Richard W. Cheston et al. (hereinafter, “Cheston et al.”), titled “SYSTEM AND METHOD FOR INSTALLING PERSONAL COMPUTER SOFTWARE” issued Jan. 20, 2004, and is incorporated herein by reference. Cheston et al. describe a system and method for installing a customized set of software on a personal computer, tailored to the requirements of the prospective user and avoiding unnecessary software and attendant license fees. Software (all that may be desired) in unusable form is loaded onto the personal computer then selected software (that which a particular user may require and/or desire) is converted (decompressed and/or decrypted) to produce usable versions of the selected software while the other software may be erased, if desired, to free up space in storage. The selection of software is done on the user's function (department and/or mission) and may be supplemented by a user selection from a menu, based on a selection utility.
U.S. Patent Application Publication 2003/0037326 to Ryan Burkhardt et al. (hereinafter, “Burkhardt et al.”), titled “METHOD AND SYSTEM FOR INSTALLING STAGED PROGRAMS ON A DESTINATION COMPUTER USING A REFERENCE SYSTEM IMAGE” published Feb. 20, 2003, and is incorporated herein by reference. Burkhardt et al. describe a computerized method and system for installing programs on a destination computer. A reference computer having an operating system installed thereon stores one or more partially installed, staged programs and/or one or more fully installed programs. The operating system, installed programs, and staged programs define a reference image that is copied to a destination computer. With a configuration file script, a user selects at least one of the staged programs for installation on the destination computer. The script further directs an installation utility to attach the selected program to complete the installation thereof on the destination computer and to detach the remaining programs not selected for installation.
U.S. Patent Application Publication 2005/0055688 to Gaston M. Barajas et al. (hereinafter, “Barajas et al.”), titled “INTEGRATED RAPID INSTALL SYSTEM FOR GENERIC SOFTWARE IMAGES” published Mar. 10, 2005, and is incorporated herein by reference. Barajas et al. describe a method for automatically installing a software image onto an information handling system. The method includes reading an order for an information handling system, reading an image manifest, installing an image specified by the image manifest onto the information handing system as installed software, and automatically configuring the installed software.
U.S. Patent Application Publication 2005/0125524 to Babu K. Chandrasekhar et al. (hereinafter, “Chandrasekhar et al.”), titled “CACHE SYSTEM IN FACTORY SERVER FOR SOFTWARE DISSEMINATION” published Jun. 9, 2005, and is incorporated herein by reference. Chandrasekhar et al. describe a method and apparatus for minimizing the size of the cache that is required to store software packages for installation on an information handling system. An analysis is conducted on the individual program files contained in a software application file. In the analysis, the software application file is disassembled into the individual program files and each of the program files is decompressed and stored in temporary file directories. Files that are common to each of the software packages are identified. After the file comparison, the method and apparatus of the Chandrasekhar et al. invention is used to re-group the files to generate a composite program file library that contains all of the program files needed to regenerate the software application files. This composite program file library is then stored on a cache in a factory server used to manufacture information handling systems in a build to order process.
U.S. Patent Application Publication 2006/0053419 to Janel G. Barfield et al. (hereinafter, “Barfield et al.”), titled “METHOD AND SYSTEM FOR MODIFYING INSTALLATION SOFTWARE” published Mar. 9, 2006, and is incorporated herein by reference. Barfield et al. describe a method, system and computer program product for modifying installation software in a data processing system. Installation software is stored on a rewritable data storage medium using a file system that allows portions of software stored on the rewritable data storage medium to be modified without modifying other portions of the software stored on the rewritable data storage medium. At least one portion of the stored installation software is modified to provide modified installation software on the rewritable data storage medium. The Barfield et al. invention enables modifications to installation software to be selectively placed on the same data storage medium that stores the installation software.
There is a need for a rapid and secure means to dematerialize files (and provide optional materialization of the dematerialized files) such that file recovery is very difficult but not necessarily impossible. This protects files (sometimes referred to herein as “digital assets”) by making data recovery cost more than the value of the digital assets at risk, such as commercial software programs, music tracks, video, still pictures, and the like. By escalating data recovery efforts from a brief, self-service utility approach to a day-long, expert effort equipped with, for example, a $250,000 suite of tools, piracy is rendered economically infeasible.