This invention relates to a forensic tool for use in retrieval and analysis of evidence stored in computer readable media.
In recent years, personal computers have become a major part of every day life. They are used for e-mail, to run word processing programs, to analyze numbers, and as tools that can aid in the completion of almost any task. They have become common place and are used in business as well as effective tools for use in the home. The migration to personal computers has not been limited unfortunately to honest individuals. Computers have also become tools that are used by criminals to perform any number of tasks. As a result, law enforcement agencies have found it necessary to become more and more familiar with computers and related evidence. Because computer data is stored magnetically and on a variety of storage mediums, computer evidence processing has evolved as a forensic science. Almost all major law enforcement agencies and all military agencies in the United States have developed computer crime units.
As a results of the increased use of personal computers, documentary evidence has transformed during the past several years from paper documents to computer data stored on floppy diskettes, computer hard disk drives, zip.RTM. disks, jaz.RTM. disks and read/writable CD ROMS. These high technology, high capacity storage devices have the potential to store the equivalent of thousands or even hundreds of thousands of printed pages. Additionally, the nature of computer technology has created multiple data storage layers in which potential computer evidence resides in a transitory state.
The existence of much of the data contained on a computer hard disk drive is unknown to the computer user whose work session created the data. As a result, such data has the potential of providing useful information for investigators, internal auditors and others who have an interest in computer evidence issues. Such incidental data, which exists on a storage media as an artifact of the system, rather than by an intent of the user, is referred to as "ambient data." The term "ambient data" is used below to refer to any large data object of mixed binary and textual content. The information in the ambient data may provide a truer picture of the computer use that the information which the user is aware and can easily modify. The investigator can use leads gleaned from the ambient data to search the data in allocated file space.
Primarily these levels of data storage deal with data that is contained in files, previously erased files (or fragments of such files) and file slack (defined below). Regarding data created by the Microsoft Windows operating environment, relevant data or data fragments potentially exist in what is known as the Windows swap file. Each of these ambient data sources of evidence is discussed in more detail below.
File Slack
Computer storage media is typically divided up into storage units called sectors. Each sectors typically contains 512 bytes of data. For efficiency in managing large storage media, most computer operating systems group one or more sectors into a larger unit, known as an allocation unit or cluster, and allocate an integral number of clusters to each file. The cluster size is determined by the version of DOS or Windows involved as well as the type of hard disk, floppy diskette or storage media involved.
File Slack or slack space is the area between the end of the file and the end of the last cluster that the operating system has assigned to the file. This area is automatically filled with random data from the computer memory by the operating system. File slack may contain information that the computer user believes has been removed from the computer. There will always be some file slack in the last cluster of a file unless, coincidentally, the file size exactly matches the size of one or more clusters. In such rare cases, no file slack will exist at all. File slack is not part of the actual file. The computer user, therefore, does not usually know about the existence of this storage area and has no ability to evaluate the content without specialized forensic software tools. Such tools typically use the file allocation table and directory to compare the true file size with the space allocated to the file to determine the location and size of the file slack. Information found in file slack is useful in internal audits and computer security reviews.
When DOS (or Windows) closes a file, after either creating or updating it, the computer automatically writes one or more clusters to disk. The file slack is created at this time and random data is dumped from the memory of the computer into file slack (the space from the end of the file to the end of the last cluster assigned to the file). By way of example, the storage of data on a computer hard disk drive typically involves cluster sizes that are larger than cluster sizes associated with data stored on floppy diskettes or zip drives. As a result, file slack can potentially be as large as 32,000 bytes. The random data written to file slack can contain almost anything including e-mail messages, passwords, network logons, etc.
Typically the cluster size is one or two sectors regarding files stored on floppy diskettes and this is dependent upon the storage capacity of the diskette involved. In the case of file slack created on large computer hard disk drives, potentially 25% of the hard disk drive's storage capacity can be occupied by file slack on a `seasoned` computer hard disk drive. The reason for this is due to the fact that modern versions of DOS/Windows assign large cluster sizes when hard disk drives are involved, e.g. 32 k clusters. Normally these huge cluster sizes occur when only one partition is involved on a high capacity computer hard disk drive.
Even when the parent file is deleted, the file slack remains as unallocated storage space until it is overwritten with the content of a new file. Essentially, memory dumps in file slack can remain for years on a floppy diskette or hard disk drive and the computer user is unaware of the existence of the data. It is interesting to note that approximately 8 printed pages of text can be stored in a 32 k cluster and depending on the size of the file involved, file slack can occupy much of this space.
Computer data is relatively fragile and is susceptible to unintentional alteration or erasure. This is especially true regarding file slack because it has some unique and interesting characteristics. As long as the file it is associated with is intact, the file slack remains intact and is relatively safe from alteration. However, if the file is copied from one location to another, the original file slack remains with the original file and new file slack is created and attached to the copied file. Disk defragmentation has no effect on the file slack.
Unallocated Space
When files are deleted using conventional DOS or Windows commands or are automatically deleted by programs such as word processing applications, the data associated with the file is not actually deleted. Although the directory listing of a deleted file is removed and the file allocation table is changed to reflect that the space previously occupied by the file is free, the data itself remains on the computer hard disk drive or floppy diskette until it is eventually overwritten with data from new files. However, the normal process of overwriting previously deleted files can take a long time depending on the size of the storage device involved and the frequency of use. The large volume of stored data associated with previously erased files can contain much information of interest to an investigator. The unallocated space will also contain the file slack that was previously associated with the deleted files.
Windows Swap Files
Windows Swap files are a significant source of potential computer evidence when Windows, Windows for Workgroups, Windows 95 and/or Windows NT operating systems are involved. These files are huge and normally consist of several million bytes of `raw` computer data. Essentially, the Windows Swap file acts as a buffer for use by the operating system as it runs programs, etc. Depending on the version of Windows and the user configuration involved, the files are created dynamically or they are static. Dynamic swap files are automatically created at the beginning of the work session by the operating system and are erased upon termination of the work session by the user. Although a dynamic file is deleted at the end of the Windows sessions, any data from the swap file is available in the unallocated disk space.
Static swap files are created at the option of the user during the initial work session and remain on the disk after the work session is terminated. The user can configure the system for either type of swap file at their option during system configuration. The size of a typical Windows Swap file can be about 100 megabytes. Because the Windows Swap file acts as a buffer for the operating system, much sensitive information passes through it. Some of the information remains behind in the file when the session is terminated. As a result, this file holds the potential for containing a great deal of useful information for the investigator and/or internal auditor. However, the large file size makes reviewing the swap file extremely time consuming. Evaluation of the content of a swap file typically took several hours or even days.
Temporary Files
Windows and other programs create temporary files that can remain after a computing session and contain data valuable to an investigator. Such files typically have a file extension of .tmp and many are found in the Windows or Windows/system directories.
"Bad" Clusters
The ambient data can be information in sectors that are indicated as unusable in the file allocation table. Most operating systems will indicate that an entire cluster is "bad" or unusable if any part of the cluster is unusable. Some of the sectors that comprise the cluster may still contain valid data, that could information useful to an investigator.
.Dat Files
Windows creates .DAT files, primarily in the Windows directory and subdirectories thereof, that are also a source of ambient data. Other programs also create such file.
Data contained in file slack, unallocated space (erased files), temporary files, .dat files, and the Windows swap file usually contains a significant amount of non-ASCII data which cannot be viewed or printed using conventional, text-viewing software applications, e.g., a word processing application, the DOS Edit program, the Windows Write program, etc. Such data is commonly referred to as binary data and some of the bytes involved may mistakenly be interpreted by standard application programs to be control characters, e.g. line feed, carriage return, form feed, etc. The equivalent of hundreds or even thousands of printed pages of data can be stored in this form on a standard computer hard disk drive. The viewing or printing of such data can prove to be a challenge for the computer investigator without proper forensic software tools. The evaluation and processing of binary data was a tedious and time consuming task. Using conventional forensic processes, the evaluation of file slack, unallocated space and the Windows swap file can be measured in days or even weeks. By way of example, a typical Windows Swap file consists of hundreds of millions of bytes of data. It can take several days to properly analyze just one of these files using conventional means.
New Technologies, Inc., the assignee of the present invention, provides tools to law enforcement agencies, corporations, and government agencies that capture the ambient data from file slack and unallocated space and remove much of the binary data from it. There still remains, however, an enormous amount of information that can take an investigator many hours to review. Thus, it has been impossible for an investigator to investigate many computers in a short period of time, as may be necessary, for example, in an organization having many computers that must be checked for evidence with minimal disruption of the work environment.