As the use of the Internet and the amount of information available on the Internet has expanded, the ability to track and monitor information available over the Internet related to a particular subject or associated with a particular entity has been negatively impacted. The vast amount of information present on the Internet makes monitoring websites nearly impossible as it is difficult to quickly and efficiently compare the large amount of information contained within the large number of websites that may be associated with an entity. These challenges extend to the enterprise environment, in which an enterprise is faced with the burden of monitoring thousands of web documents accessed throughout an enterprise network including enterprise websites. In an enterprise system having thousands of electronic documents (e.g., documents provided via a website), compliance and security of the enterprise network and the enterprise website becomes difficult to manage.
Of the many challenges faced with network security in any type of computing system (e.g., enterprise system or a cloud computing system), web documents for a website may compromise the security of the enterprise system. Electronic documents for a website may include or implement one or more web components, designed to support a web-based feature, such as content management. For example, web documents may be designed to support a web framework for managing content provided for the web document. The web framework may be defined by multiple, different types of web components. Examples of web frameworks may include proprietary solutions such as WordPress®, Drupal®, Joomla®, and Concrete5®. A web component in an electronic document for a website may undergo several versions through its lifetime. The versions of a web site can correspond to changes in a web component due to version changes in a web framework using the web component. The changes in the versions may be difficult to track for a large website. An entity managing a website hosting many web documents may desire to consolidate different web frameworks, or even different versions of web frameworks. The consistency in web frameworks for a website may enable users that manage the website (e.g., an administrator or an operations analyst) to better manage security and operations of a website. By limiting and identifying web framework usage, the security and operation of a website can be improved. Some web components and/or web frameworks may have or expose security vulnerabilities to a website that may go undetected if not discovered in the website. Some websites may implement multiple different web frameworks, each of which may have shared or conflicting vulnerabilities. Some vulnerabilities of a web framework may be exposed by an older version that enable malicious third parties to hide malicious code from an entity's domain names without such entity knowing that any changes have occurred or that such domains have been taken over by malicious code. As such, it is difficult to ensure that bad actors are not altering, misappropriating, and/or otherwise compromising or exploiting data, including ways that interfere with privacy, or damage an entity's intangible business assets, such as intellectual property and goodwill.
Accordingly, businesses are challenged to find ways to accurately and periodically identify and detect changes in a web framework and/or web components of documents hosted by a website. Detecting changes in a web document, in particular web framework changes, becomes paramount in dealing with security of a network, such as a network within an enterprise system. Many web frameworks provide a publicly accessible file in an administrative directory that contains the exact version installed. Using targeted (or active) crawling, the file can be downloaded and the version of the framework determined with little trouble. However, this method may not be reliable. First, the file may not be accessible via the public Internet. Removing or protecting this file from being read by external visitors to the site is a common practice. Secondly, some frameworks may not provide such files. Either way, these files are not typically requested through normal crawling or browsing. Requesting these files, whether they exist or not, should raise suspicion of the system administrator and security team, and may be considered obtrusive.