1. Technical Field
The present invention relates in general to a system and method for automatic signature generation for content recognition. More particularly, the present invention relates to a system and method for using a selection algorithm to select particular files and directories in a file system, and automatically generate a signature based upon the selected files and directories.
2. Description of the Related Art
Computer networks may include hundreds of devices such as servers and clients. Each of these computer devices includes a file system that stores content, such as program files and non-program files. File system content may include word processing programs, spreadsheet programs, database management programs, documentation files, web content, collections of spreadsheets, and program source code files. For asset management, systems management, and configuration management purposes, a system administrator is required to track the content of each of these computer devices.
Tracking the content that resides on a multitude of computer devices, especially those in large computer networks, may be virtually impossible without an automated method. One automated approach is to use a “signature” to detect content that resides on computer devices. A signature includes information about files that correspond to a software program or a non-program content set. A matching algorithm compares the signature with the set of files that reside on a computer device's file system. If the matching algorithm returns a positive result, the computer device is logged as having the software program or non-program content set on its file system corresponding to the signature.
Some signatures include one file with one file size, which may not be an accurate characterization of a software application or another set of files, such as web content or documentation. A challenge found is that existing signatures may be overly sensitive to variations in file name and size due to applied patches and/or installation options. Another challenge found is that existing signatures do not identify partial matches. Meaning, either the signature identifies a match with a file system or it does not.
Yet another challenge found is that when a new version of an application is released, a new signature must be created in order to detect the new application version in a file system. Creating a new signature requires careful examination of the new version and each of its previous versions simultaneously in order to identify a file name and size that reliably differentiates the new version from the old versions, which is difficult and time consuming.
What is needed, therefore, is a system and method to automatically generate signatures for use in content identification that resides on computer devices.