Simulating a file system using artificial data is a well-known method for predicting, studying and assessing file system performance. The generation of artificial data for a simulated file system is also known as “generating a file system model” or “modeling a file system.” File system simulation is a way to recreate content similar to a “real” working file system in order to simulate changes to and interaction with the file system model (i.e., file system events) without interfering with the working file system. For example, when access to a file system data is restricted, a file system model of that restricted file system will be generated for studying, testing and development purposes. Analyzing these file system models can reveal how effectively file system storage is used and how file system data is generated and altered over time. Therefore, in order to produce more relevant analyses, it is important to have an accurate file system model that resembles the original file system as much as possible.
Having an accurate file system model is especially important for companies that design, develop and market file system hardware and software. These companies need to generate accurate file system models of their customers' data in order to create optimal file system hardware and software that best suit their customers' needs. This requires knowing the size of a customer's current file system(s), what its projected needs will be, what kind of file system events occur over time, etc. One will appreciate that file system events may include a number of operations and interactions with a file system. File system events commonly include changes made to files, directories and other file system objects. These changes may be the result of manual user actions or the result of operations performed by software applications that interact with the file system.
File system events can be categorized into several general types. A first type of file system event is known as the “file creation/deletion semantic.” These changes include situations where a file system object is created or removed from the file system. A second type of file system event is an “overstrike semantic.” This type of file system event is more common with large files of many gigabytes. Over time, these files tend to stay in the same location on the file system, but they may incorporate slight changes to their content. As a general principle, however, the content of these files is largely static. A third type of file system event is the “insert/cut semantic.” The insert/cut semantic is more often associated with small files. These include edits and changes to files created using word processors and text editors. One will appreciate that the presence, absence or frequency of these file system events varies from customer to customer.
Since each customer's file system will experience a different combination of file system events, it is important to understand how each customer's file system is used. The customer's file system can be manually observed, or more commonly, a logger program can be installed that will record all file system events. However, customers are unlikely to provide full access to their data or permit surveillance of their data. Therefore, file system models are generated using information gathered from the customer. This information may be used to set the parameters and scope of the file system model. Alternatively, the file system company can make assumptions about their customers' file systems based upon previous experiences, or it can generate a file system model using generic parameters. However, these techniques are not very precise, and as a result, the file system model may not properly simulate how the customer actually uses its file system. As a further result, the file system company may end up providing file system hardware and software that does not exactly match the customer's needs.
What is therefore needed is a less invasive way to determine a customer's most common file system events in order to generate a more accurate file system model. Specifically, what is needed is a way to determine the type of file system events that commonly occur in the customer's file system, and how often these events occur.