1. Field of the Invention
The field of the invention relates to methods and systems for performing scanning operations on data. More particularly, the field of the invention relates to a grid-based method and system for performing such operations.
2. Description of the Related Art
As modern enterprise environments trend towards a paperless workplace, electronic data is often created at a high rate. This electronic data takes a variety of forms which may include emails, documents, spreadsheets, images, databases, etc. Businesses have a need to effectively and securely store all of this electronic data in ways which are time and cost effective. However, there are problems that arise with these tasks due to the sheer amount of electronic data created and stored within a modern business.
For example, some electronic files which enter a business' computing environment may need to be scanned before or shortly after they are stored, and scanning a large number of files can consume substantial computing resources. One common reason to scan a file is to search for computer viruses or other malicious software code which can corrupt other data or harm a business' computing infrastructure. As the prevalence and sophistication of computer viruses and other forms of harmful software have increased, virus scanners have become an indispensable tool for businesses.
Typically, scanners are implemented either as real-time “filters” or as off-line “batch” processes. The filters, sometimes implemented as file system filter drivers, are software products that insert themselves into the I/O processing path of the operating system. Filters intercept certain types of file I/O requests and check the file contents for known virus signatures, suspicious characteristics, or suspicious patterns of activity. When such suspicious patterns are detected, the filter blocks the completion of the I/O request and takes some protective action, such as deleting or quarantining the suspect file.
As virus authors apply more sophisticated techniques, such as self-mutating or encrypted code, the filter logic required to detect such viruses becomes more and more complex, demanding more processing time and memory from the computer system to inspect the files. This can adversely affect the performance of the system and, in some cases, force a user to downgrade the level of protection in order to keep the system at a usable level of responsiveness.
Batch scanners take a different approach to scanning computer data for viruses. Rather than scanning files as certain I/O requests are made, batch scanners systematically traverse the file system in search of malicious software code. While they do not interfere with other applications directly, i.e. by increasing the latency of I/O requests, batch scanners can place a large processing load on the system. For this reason, they are typically run at night or during off-hours, when the computer system is not actively in use. In some cases, because batch scanners run intermittently, viruses may have hours or even days to propagate between scans. Filters may also suffer from this drawback as new virus types may emerge and infect the system before the filter's database of virus signatures has been updated to meet the threat.
It can be difficult to scale traditional methods of scanning computer files, whether for viruses or some other reason, to meet the needs of large file systems and active servers because both methods consume substantial resources from the host operating system. Filters can add significant latency to each I/O request, slowing the system down incrementally, whereas batch scanners can create a period of peak activity which noticeably degrades the performance of other applications.