There is a tremendous variety in how software and hardware products are named and versioned across the IT industry. Even the same product might be named differently on different platforms, or be named differently by different IT inventory sources. For example “Microsoft Windows 2000 SP1 for 64-bit” and “Win2K Service Pack 1, x64” are actually names (banners) that refer to the same product and version. While that might be clear to humans (for example, to IT people), it is not obvious at all to machines (software applications) that are involved with processing that data. A standardized (normalized) representation of product names and versions is a key enabler for inventory management, compliance, and security automation. Some specific examples for that need are the following:
Product inventory management—when managing a product inventory for an organization, it should be possible to determine whether products reported by two different discovery tools are the same. If the names are not standardized, different names are not necessary an indication for different products. As a result the inventory might suffer from wrong duplication of products.
Compliance—Compliance might be involved with comparing the list of products and versions installed in an organization to an approved list of products and versions. The success of that comparison relies on standardization of product and version names.
Vulnerability Detection—it is possible to identify that a version of a product installed on a host is vulnerable by comparing its version to an inventory of vulnerabilities, in which the affected product and its version are indicated. To enable that, the indicated affected products and the information on products installed on the host should be represented in the same normalized way. This also applies for checking whether hosts in an organization are affected by a new published vulnerability (a threat alert). By the term host we relate to a computer or device (physical or virtual) on which software can be installed.
Malware Resilience Analysis—certain products are more prone to malware attack or propagation than others. Identifying the installation of these products within an organization is important for security management purposes. In order to successfully compare specifications of products associated with malware and the actual products installed on hosts at the organization, a normalized representation of product names is required.
The CPE (Common Platform Enumeration) [1, 2] is an open standard led by mitre.org for structured naming scheme of IT products. The Naming Specifications part of the standard defines the logical structure of the names. Product names according to CPE are broken into attributes, such as part, vendor, product, version, update, language, sw_edition, target_sw, and target_hw.
NIST (The US National Institute of Standards and Technology) maintains a dictionary (repository) of CPE names, covering the more common products and platforms [3].
While CPE could be a good choice for standardize the names and versions of IT products, it does not by its own resolve the above problems. Some of the issues and challenges are listed below:
The standard has not adopted yet by product, operating systems, and platform vendors. That means that the products installed on a host are named in a non-standardized format. Typically, operating systems or discovery tools will use two or three “banners” (free text strings) to describe the vendor, product (product title) and version of products installed on hosts, rather than using the CPE standard. There is no published method for automatically transforming that representation onto a CPE name.
There is no unique CPE dictionary (even not the official dictionary) that holds the name and version of all the products. In fact, when a vendor releases a new product, edition, version, or update, it does not have a process or commitment to assign to it or get for it a CPE name. That means that the content of a CPE dictionary is inherently partial. A version or edition of a product that an organization might have on its host might be missing in the CPE dictionaries.
The CPE standard supports multiple dictionaries. Each dictionary has to adopt the same CPE naming format, but can decide on the content of the names. So for example, one CPE dictionary might represent Windows 7 as “cpe:2.3:o: microsoft:windows:7:” (i.e. 7 is the version) while another dictionary might represent it as “cpe:2.3:o: microsoft:windows_7:-”: (i.e. windows_7 is the product name, and the version field is empty). In principle, the inventory in the organization might be managed according one CPE dictionary (e.g., NIST) while a security service (such as a threat alerts service) might use a different dictionary for specifying the affected services of the new published vulnerabilities. That situation is an obstacle for a successful automation of vulnerability detection, for example.
ISO/IEC 19770-2 provides a standard for software identification (SWID) tags [6]. Software publishers (vendors) could use the standard to tag their software products, enabling by that the accurate identification of these products.
While SWID tags might contribute a lot to standardization, it does not solve the above problems by its own:
So far, only limited set of vendors adopted the standard. Therefore in many cases (if not the most) the discovery tools will not have SWID tags for the products they discover.
The set of attributes (tags) used by SWID is different than the set of attributes used by CPE. Currently there is no good mechanism for translating in between these standards. The gap between the standards might be reduced in the future but that might be a long process.
The methods described in this application provide a way for translating non-structured (or not enough structured) banners into a desired standard (normalized) format, and for converting names specified according to one standard into another.