The invention will be illustrated in conjunction with NTFS (New Technology File System). NTFS is present in every NT and higher operating system manufactured by Microsoft Corporation. In NTFS, the attributes of a file 2 within a computer 10 (see FIG. 1) are stored in alternate data streams. A file 2 is a composition of many streams. In NTFS, any data stream can have multiple alternate data streams associated therewith.
NTFS 5.0 (and above) supports multiple hard links 1. A “hard link 1” is a pointer to the file 2 that comprises, at a minimum, the file name. The hard link may comprise the full path name including the file name. Since there can be multiple hard links 1, any file 2 can have multiple file names, even though there is but one physical version of the file 2. Two or more of the file names can point to the same file 2 data, but be located in the same or different directories. FIG. 1 shows an example in which there are three hard links 1 to file 2. Modifying file via any one of the hard links 1 changes the underlying data pointed to by all of the hard links 1.
NTFS considers all file names to be hard links 1 to the file in question, but most files 2 have just one hard link 1 associated with that file 2. An NTFS file 2 is deleted when all hard links 1 to it are removed, i.e., when the last hard link 1 is removed. This means that a first hard link 1 (1) could be created for file 2 in a first directory, a second hard link 1 (2) could be created for the file 2 in another directory, the first hard link 1 (1) could be deleted, and the second hard link 1 (2) would still exist.
Other operating systems also support hard links, e.g., UNIX (in which they are called hard links) and OS2 (in which they are called shadows).
Antivirus scanners often make optimization decisions based upon the path or extension of a file 2. For example, if an ostensibly temporary file having the extension .tmp is opened, the antivirus scanner may decline to scan the contents of the file 2, because a .tmp file is not deemed by the antivirus scanner to be executable. However, if that .tmp file name is actually a hard link 1 (2) to an existing .exe file 2, modifying the .tmp file actually modifies the .exe file as well. In this scenario, the antivirus scanner could be lulled into not scanning a file 2 for the presence of malicious code when it should be scanning the file 2. This can result in computer 10 being harmed by the malicious code. As used herein, “malicious code” means any computer code that enters the computer 10 without an authorized user's knowledge and/or without an authorized user's consent. Thus, “malicious code” can include viruses, worms, and Trojan horses. As used herein, the term “antivirus scanner” is used in the broad sense, so that such scanner can detect all types of malicious code, including worms and Trojan horses as well as viruses.
An NTFS file 2 contains the number of hard links 1 to the file 2, but it does not identify the hard links 1 any further. Thus, when a user accesses file 2 via the second hard link 1 (2), this user knows that there are two other hard links 1 (1) and 1 (3), but is not told what they are. The only way for the user to find the other hard links 1 (1) and 1 (3) is to note the serial number that is included in each NTFS file 2, and then to do a search for all of the files 2 in the computer 10 matching that serial number. That is an extremely time consuming and cumbersome operation, and is the problem addressed by the present invention.
Bolosky et al., “Single Instance Storage in Windows 2000”, downloaded from the Internet on Oct. 11, 2002 at http://research.Microsoft.com/sn/Farsite/WSS2000.pdf, discloses the creation of backpointer tables in cases where a single file has multiple hard or symbolic links pointing to the file. Unlike the present invention, where the contents of file 2 do not change, an attempt to modify the contents of the target file in the reference results in a new version of the target file being created.