File prospecting is the open-ended exploration by a user for file resources within the file system(s) of an organization, where the exact names, locations or content of the target files are unknown. Users within a given organization may prospect legitimately as part of their normal work activities, but such open-ended browsing through the organization's files can also be done illicitly during activities such as data theft or destruction. File prospecting may also occur during the reconnaissance phase of targeted advanced persistence threat (“APT”) attacks, in which an unauthorized party gains access to a computer system and operates covertly for a period of time for malicious purposes. Detecting file prospecting is thus important for preventing and remediating data loss and persistent threats. More specifically, detecting malicious file prospecting activity can prevent significant damages to a company.
Currently, no general purpose techniques exist for detecting file prospecting, much less distinguishing between legitimate and malicious uses thereof. Monitoring file accesses by users involves an unlabeled, noisy, mixture of normal activity, including legitimate prospecting through the organization's file resources by various authorized parties, as well as the possibility of abnormal prospecting activity. This being the case, ground truth labels for user access activities are lacking, and it is not practicable to attempt to create them for all file system access activity by all users within an organization. Thus, conventional supervised learning techniques are not applicable or suited for use in the detection of file prospecting.
Conventional file access management systems detect when a specific user accesses a file or folder which that user does not typically access. However, these systems require long windows of time (months or even years) to establish which file system objects each user typically accesses. Even then, the system does not identify file prospecting or open-ended browsing, but instead only an anomalous access of a given file system object. Other conventional systems monitor the frequency with which specific files or folders are accessed, and trigger alerts if given thresholds are exceeded. Other systems flag the access of files with keywords in the file or path name. However, none of these systems are able to identify file prospecting activity, much less distinguish between patterns of access associated with legitimate versus malicious file prospecting.
It would be desirable to address these issues.