A modern organization typically maintains a data storage system to store and deliver sensitive information concerning various significant business aspects of the organization. Sensitive information may include data on customers (or patients), contracts, deliveries, supplies, employees, manufacturing, or the like. In addition, sensitive information may include intellectual property (IP) of an organization such as software code developed by employees of the organization, documents describing inventions conceived by employees of the organization, etc.
Organizations invest significant efforts in installing DLP components, especially on important machines where confidential data is getting generated, but they may not be able to protect each computer in the enterprise, due to reasons like large number of different platforms or operating systems (OS), machine outages, quick and dynamic provisioning of virtual machines, no clear and individual accounting for test and lab machines. DLP technologies apply configurable rules to identify objects, such as files, that contain sensitive data and should not be found outside of a particular enterprise or specific set of host computers or storage devices. Even when these technologies are deployed, it is possible for sensitive objects to ‘leak’. Occasionally, leakage is deliberate and malicious, but often it is accidental too. For example, in today's global marketplace environment, a user of a computing system transmits data, knowingly or unknowingly, to a growing number of entities outside a computer network of an organization or enterprise. Previously, the number of entities were very limited, and within a very safe environment. For example, each person in an enterprise would just have a single desktop computer, and a limited number of software applications installed on the computer with predictable behavior. More recently, communications between entities may be complex and difficult for a human to monitor.
Some applications for data loss protection may include a reverse name lookup support in a file system. For example, a given inode number (ino), the reverse name lookup may return a complete path of a file. An inode is a data structure on a file system that stores information, also sometimes referred to as metadata, about a file, a directory, or a file system object. The inode however typically does not contain the actual data or the name of the file. For example, each file is associated with an inode, which may be identified by an integer number, referred to as i-number, inode number, or ino. The inodes may store information about files and folders, such as file ownership, access mode permissions, and file types. Generally, the inode number indexes a table of inodes in a known location on a device, and from the inode number, the file system driver portion of the kernel can access the contents of the inode, including the location of the file allowing access to the file. As described above, the inodes usually do not contain file names, only file metadata. Thus, a file system driver should search a directory looking for a particular file name and then convert the file name to the correct corresponding inode. The reverse is true as well.
Conventional ways of calculating complete path from an inode number (ino) typically result in many disk accesses. One conventional method could start from a root and do recursive searching of the inode number in the directory entries (dentry) of all the directories and sub-directories and keep appending the directory (dir) name in the resultant path, and removing its name if not found in that directory. This results in a reverse lookup using a forward lookup, which leads to a very large number of disk accesses. Some of the file system, like the Veritas File system (VxFS) improves it by storing parent directory's inode number on this disk inode to reduce disk access of searching parent directory, removing the necessity to do a forward lookup for the reverse lookup operation. But again a large number of disk accesses is usually used to search a dentry with an inode number in all the data blocks of the directory. The following example provides some mathematics to illustrate the number of disk accesses using the conventional method. This example considers an average size of dentry as 32 bytes, keeping 16 bytes as an average size of file name. Block size is 4K=4096 bytes. Hence, a block can hold up to 27=128 dentries. Now, if a directory contains 10 million files, then the number of data blocks required for the directory would be approximately 100,000 or 100K. In a worst case, to search an inode number in the directory would require 100,000+1+1 disks access. And the best case would be 1+1+1=3 where the inode number found in the very first data block of the directory, which has very low probability. Disks accesses increase drastically if there are multiple such directories where millions of files are stored. Searching of inode number in dentries in all the data blocks of the directory may be a bottle neck for reverse name lookup. Also, if there is a case of reverse name lookup of an inode number which has many hard links. A hard link is a directory entry that associates a name with a file on a file system. By contrast, a soft link on such file systems is not a link to a file itself, but to a file name. Currently, conventional solutions typically give only the path name of first hard link. Also, conventional solutions usually allow the path name of all the hard links to be looked-up from a given inode number, but still utilize a very lengthy method for calculating the path name such as described above.