1. Field of the Invention
This invention relates to the field of data processing systems. More particularly, this invention relates to data processing systems in which it is desired to scan a plurality of computer files to identify one or more predetermined characteristics indicative of a computer file having some predefined properties.
2. Description of the Prior Art
It is known to provide anti-virus computer programs and E-mail and data filtering programs. Anti-virus programs may operate in an on-access mode or an on-demand mode. The on-access mode initiates a scan of a file when an access request to that file is made. The on-demand mode initiates a scan of all files on a specified volume or volumes either on a user request or on a scheduled request.
An anti-virus scan of a file consists of scanning that file for computer viruses, worms, Trojans or other undesired content. This is done by comparing the file with a library of data that defines content to be detected.
In a similar manner, content filtering programs scan files and incoming or outgoing messages for undesired content. This may happen when the messages are flowing in or out, or alternatively, it may take place in an on-demand way. That is to say, a scan of a complete volume of data or messages is initiated by a user or as a scheduled event.
A problem found with on-demand scans is the ever increasing time needed to perform this scan. This is due to an increasing amount of data to be scanned along with a growing number of computer viruses and other undesired forms of content for which it is desired to scan. In general an on-demand scan is performed at slack times, such as during the night or at a weekend, to avoid overloading of the server. However, given the increasing time required for these scans, the situation can arise when these periods of time are not sufficient to allow an on-demand scan to be run. This can result in such scans being terminated early which decreases the security and usefulness of such systems.
A further problem that may arise due to the length of time required to scan all the files on a particular system is that new viruses, for example, may be discovered mid-way through a long scan. Thus, new data defining the properties to be scanned for is available mid-way through the scan but is not used, so that the latter part of the scan is not as complete as it could be, there being data available that is not scanned for.
Viewed from one aspect the present invention provides, a computer program product comprising a computer program operable to control a computer to scan a plurality of computer files for predefined properties, said computer program comprising: computer file request logic operable to control said computer to issue computer file requests for computer files to be scanned; scanning logic operable to control said computer to scan said requested computer files for predefined properties in dependence upon property defining data defining said predefined properties; update checking logic operable to control said computer to periodically check for an update request to update said property defining data; update applying logic operable to control said computer to stop said computer file requests and to update said property defining data in response to said update request, and, on completion of said update, to resume operation of said computer file request logic such that subsequently requested files are scanned against said updated property defining data; wherein, when all of said plurality of computer files have been requested, said computer file request logic is operable to request a first computer file again.
The present invention addresses the problem of the storage of ever increasing amounts of data leading to scans taking longer and longer. It does this by scanning the files in a circular manner such that when all files have been scanned the scanner automatically starts the process again at the first file. Any new files created during a scan will therefore take their place with the other files in the list of files to be scanned and, given the circular nature of the scan, will themselves in time be scanned.
Additionally, if, for example a new virus is discovered mid-way through a scan, the present computer program product comprises updating facilities enabling the scan to be stopped while the data file containing property defining data is updated, the scan is then resumed at the next computer data file and all subsequently scanned files use the updated information. This means that any new property defining data that is available mid-way through a scan can be added to the property defining data mid-scan so that the latter part of the files are scanned for this data too. Furthermore, as when a scan has completed it automatically starts scanning from the beginning again, a scan of the early files including this new data will start immediately the present scan of all the files has completed.
The update checking logic checks for an update periodically, thus in some embodiments it checks after every file, while in others it does so after a certain number of files, or after a set period of time. This period of time may be constant throughout the scan or may vary depending on, say, time of day, or number of files already scanned. The period of time may be a set value written into the computer logic or it may be a value that is input by a user.
In some embodiments, the computer files scanned are those stored on a particular drive or in a particular directory although preferably all of the computer files stored on the computer are scanned.
In some embodiments, said computer program comprises at least one priority code, said priority code determining an amount of said computer""s resources to be allocated to said computer program. Preferably, said at least one priority code is time dependent and comprises a high priority code during non-working periods and a low or zero priority code during normal working time. Thus, during working hours when the computer is being used for other things the scan is given a low priority and therefore, does not take up a lot, if any, of the processing time, whereas overnight, for example, when the computer is not being used for other tasks it has a high priority and can use a much greater proportion of the processing time to scan the files more quickly.
Although said computer file request logic can request files, non-sequentially or in parallel, in preferred embodiments said computer file request logic is operable to issue sequential computer file requests for computer files to be scanned.
In preferred embodiments said computer file request logic is operable, in response to an addition of computer files to said plurality of computer files, to issue a request for said newly added computer files. Thus, if new files are added to the system mid-scan these are placed at a high position in the queue so that they are scanned soon. This is important as any new file being added to a system carries a risk of virus infection with it.
Preferably, said computer program comprises storage logic, said storage logic being operable to control the computer to store computer file identifying data identifying said last requested computer file.
By storing data identifying the last requested computer file, this data can be accessed if the program is stopped for a length of time sufficient for the operating system to have forgotten which file was last sent.
Preferably, on resumption of operation of said computer program following a stoppage, said computer file request logic is operable to check said requested computer file against said stored computer file identifying data and if said requested computer file is not a computer file subsequent to a computer file identified by said stored computer file identify data to discard said computer file without implementing said scanning logic and to request a subsequent file.
The above helps ensure that scanning starts again at the point at which it was stopped, even in the case that the operating system has forgotten where it was. This helps produce an efficient use of scanning time, as all files are scanned in turn. Thus, if a scan is not completed in a particular downtime of the computer it can be restarted at that position at the start of the next downtime. This helps the scanning resources of the computer to be used efficiently. Furthermore, the process of checking each file against the computer file identifying data and not scanning it if it is not the desired file can be performed quickly, thus the scan can be restarted again without too much loss of time.
Advantageously, said computer program further comprises stop condition checking logic operable to control said computer to periodically check for a stop condition and to end said computer file scan on detection of said stop condition.
This enables the scanning program to be stopped, the stop condition may appear automatically in response to certain conditions, such as a certain loading of the computer, or it may be input by a user. Thus, if other work is to be done on the computer the scanning program may be stopped to avoid it taking up computer power required for other processes.
Further aspects of the present invention are set out in the appended claims.
The above, and other objects, features and advantages of this invention will be apparent from the following detailed description of illustrative embodiments which is to be read in connection with the accompanying drawings.