In the course of performing work in various systems (e.g., storage systems, computer systems, etc.), it is often desirable to identify files that are related to each other. Identifying related files (into corresponding groups) can be relatively complex, particularly in large systems containing a relatively large number of files, since files may span multiple directories, one directory may contain files for multiple groups, or files may be shared by several groups.
In some cases, the identification of related files can be performed manually, which is tedious and time consuming. Other conventional techniques focus on identifying relationships between files based on temporal locality, which is according to the observation that related files are usually accessed close to each other in time. The temporal locality techniques in some scenarios may capture spurious, coincidental relationships between files, such as when a user is listening to music while authoring a document.
Another conventional technique is based on the ability to track reads and writes to files within the same process, as well as track inter-process communications. However, this latter technique involves the use of relatively detailed system call information that may be available in local client machines, but which may not be available in other types of systems. As a result, application of this technique may not be possible in contexts where such detailed information is not available.