Over the last decade, the use of computers and the Internet has grown exponentially. Indeed, for many individuals, government agencies and private corporations it is an integral part of their daily lives and business practices. People can communicate, transfer information, engage in commerce and expand their educational opportunities with little more than a few key strokes and the click of a mouse. Like revolutionary technologies before it, the great advancement of computer systems, information technology and the Internet carries enormous potential both for advancement and for abuse. Unfortunately, criminals exploit these same technologies to commit crimes and harm the safety, security, and privacy of the society.
Although there are no exact figures on the cost of computer crimes in America, estimates run into the billions of dollars each year. The United States Federal Bureau of Investigation (FBI) has indicated that digital evidence has spread from a few types of investigations, such as hacking and child pornography, to virtually every investigative classification, including fraud, extortion, homicide, identity theft, and so on. Although there are as yet no definitive statistics on the scope of the problem, there is no doubt that the number of crimes involving computers and the Internet is rising dramatically. A survey conducted by the Computer Security Institute in 2007 revealed substantial increases in computer crime. About half (46%) of the companies and government agencies surveyed reported a security incident within the preceding twelve months. The reported total loss of the participants is $66,930,950. The average annual loss for each participant is $350,424 compared to $168,000 for the previous year. And unlike more traditional crimes, computer crime is especially difficult to investigate. Other criminal and terrorist acts and preparations leading to such acts, increasingly involve the use of computer systems and information technologies as well. These criminal and terrorist activities leave behind a trail of digital evidence. Digital evidence varies widely in formats and can include computer files, digital images, sound and videos, e-mail, instant messages, phone records, and so on. They are routinely gathered from seized hard drives, file servers, Internet data, mobile digital devices, digital cameras and numerous other digital sources that are growing steadily in sophistication and capacity.
Computer forensics is the practice of acquiring, preserving, analyzing, and reporting on data collected from a computer system, which can include personal computers, server computers, and portable electronic devices such as cellular phones, PDAs and other storage devices. Collecting and analyzing these types of data is usually called digital data identification. The goal of the process is to find evidence that supports or refutes some hypothesis regarding user activity on the system. When accurately and timely identified by a forensic investigator, digital evidence can provide the invaluable proof that helps the conviction of a criminal, or prevents a looming terrorist attack. A delay in identifying suspect data occasionally results in the dismissal of some criminal cases, where the evidence is not being produced in time for prosecution.
The amount of digital evidence is growing rapidly. Not only has the number of crimes involving digital evidence increased dramatically over time, but the total volume of data that is involved has increased at an even faster pace. This is the result of the increased presence of digital devices at crime scenes combined with a heightened awareness of digital evidence by investigators. Given the declining prices of digital storage media and the corresponding increases in sales of storage devices, the volume of digital information that investigators must deal with is likely to continue its meteoric increase.
A typical computer forensic process involves first the determination that the evidence requirements merit a forensic examination. Individuals who are expected to have access to that evidence are then identified. Further, all computer systems used by these individuals which might contain relevant data are located. Forensic images of those systems are taken, and analyzed for relevant evidence. Traditionally, a forensic investigator seizes all storage media, creates a drive image or duplicates it, and then conducts their examination of the data on the drive image or duplicate copy to preserve the original evidence. A “drive image” is an exact replica of the contents of a storage device, such as a hard disk, stored on a second storage device, such as a network server or another hard disk. One of the first steps in the examination process is to recover latent data such as deleted files, hidden data and fragments from unallocated file space. Digital forensic analysis tools used today are stand alone systems that are not coordinated with systems used by the forensic investigators and Information Technology (IT) staff. Current computer forensics analysis is largely a manual labor intensive process. It requires computer forensic investigators that have specialized training. The cost of the analysis is high. The rate for some computer forensic investigators can be more than $250/hour. It usually requires a long analysis time taking from days to weeks. Because it is a manual process, there is potential for human error resulting in missed data and missed discovery. In addition, when facing a complex investigation that involves a large number of computer systems, it is difficult to determine what systems to analyze. This may have two undesirable results: expending limited time and resources on useless systems, or missing systems that contain vital information.
The tremendous increase in data exacerbates these problems for forensic investigators. The number of pieces of digital media and their increasing size will push budgets, processing capability and physical storage space available to the forensic investigators to their limits. In an effort to reduce the volume of digital files for review, seized digital evidence is processed to reduce the amount of this data. Presently, there is no effective means to quickly sort through the amount of data based on the content of the data, and identify documents and files of interest for further detailed examination. Present solutions still require manual review from forensic investigators to identify specific data needed to prove guilt or innocence.
Government and business entities use sophisticated computers systems to store, track and disseminate information within the entity and communicate with outside individuals and entities. Information can be stored as files that exist on a computer file system, and can exist in many heterogeneous forms such as plain text documents, formatted documents (e.g. Microsoft Word® documents, Open Document Format documents), spread sheets, presentations, Portable Document Format documents, images of paper documents, graphics, sound recordings, videos, faxes, email messages, voice messages, web pages, and other stored digital media. Information can also be stored as entries in databases such as a relational database or a document management system. This information is subject to a wide range of user manipulations, such as create, edit, copy, rename, move, delete and backup. Information can also move among the entity computer systems through various communication means, such as emails, attachments, file sharing, shared file systems and push technology. Information can also leave the entity computer systems either by someone within the entity sending it to an outsider, or can be retrieved by an outsider from the entity computer systems by obtaining information containing removable storage media or through network access protocols such as HTTP, FTP, and peer-to-peer file sharing. All of this creation, manipulation, transfers, and communication of digital information can be part of the legitimate business process. However, abuse of the computer system also involves the same processes of creation, manipulation, transfer, and communication of information, albeit unauthorized or illegitimately. The Computer Security Institute 2007 survey also revealed that insider abuse of the network access or email edged out virus incidents as the most prevalent security problem. While a majority of all computer attacks enter via the Internet, the most significant of all dollar losses stem from internal intrusions.
The most important asset of many companies is their Intellectual Property (IP). Customer lists, customer credit card lists, copyrights including computer code, confidential product designs, proprietary information such as new products in development, and trade secrets are all forms of IP that can be used against the company by its competitors. Common risks for a corporation may be theft of trade secrets and other privileged information, theft of customer or partner information, disclosure of confidential information, and disclosure of trade secrets and other valuable information (designs, formulas etc.).
Corporations may also incur liability or exposure to risks when unauthorized contents are stored in the computer systems, such as child pornographic material, or pirated copies of media or software. An organization must know which of its assets require protection and the real and perceived threats against them.
Current information security builds layers of firewalls and content security at the network perimeter, and utilizes permissions and identity management to control access by trusted insiders to digital assets, such as business transactions, data warehouses and files. This structure lulls the business managers into a false sense of security. Many employees are restricted in their access to sensitive data, but access control is usually not easily fine tuned to accommodate the ever changing assignments and business needs of all the employees. Moreover, as is necessary to perform their function, Information Technology (IT) employees have access to sensitive data and processes. Indeed, IT employees are the custodians and authors of those objects. This may place them in positions to reveal information to others that will damage the company or directly sabotage a company's operations in various ways. IT employees who are disgruntled, angry, or seeking to steal information for profitable gain, may attempt to steal sensitive digital information which could lead to substantial losses for the organization. A laid-off employee is a prime source of potential leakage of such information.
Content-security tools based on HTTP/SMTP proxies are used against viruses and spam. However, these tools weren't designed for intrusion prevention. They don't inspect internal traffic; they scan only authorized e-mail channels. They rely on file-specific content recognition and have scalability and maintenance issues. When content security tools don't fit, they are ineffective. Relying on permissions and identity management is like running a retail store that screens you coming in but doesn't put magnetic tags on the clothes to prevent you from wearing that expensive hat going out.
A hash analysis is a method that can be used for comparing the content of digital evidence. A cryptographic one-way hash (or “hash” for short) can be a way to calculate a digital fingerprint: a very large number that often uniquely identifies a digital file. A hash is a calculated function on the bits that make up a file. Therefore, two files with different names but the exact same contents will produce the same hash. However, using hash systems to identify conclusive or known suspect files faces several challenges. By design of the hash function, a small difference, even a single bit, in the input file will generate a significantly different output hash. The difference between two hash numbers does not reflect the level of similarity of the input files. The hash method cannot be used to identify files that have been altered, whether minimally or substantially. They are therefore not able to identify derivative files, files that contain common contents but are arranged or formatted differently or contain more or less other content. For the same reason, hash analysis is not effective against multimedia files (image, video, and sound). As a consequence, an individual using these files to commit crimes may escape hash based detection and prosecution.
It would be beneficial and desirable to integrate newer, advanced technologies to automate the detection and classification process for suspect files and identify related altered or derivative files. This would allow forensic investigators to focus on identifying relevant data during the forensic process and addresses many of the problems of efficiency, cost and delay facing digital forensic examinations today. There is also a need for a technology to scan and manage digital data on a computer system based on the content of the data. There is a further need for a solution to allow government agencies and corporations to automatically monitor and prevent unauthorized use or exchange of classified or proprietary data.