1. The Field of the Invention
This invention relates to systems, methods, and computer program products related to analysis of websites.
2. Background and Relevant Art
Websites are becoming increasingly more common and important for organizations to convey information to their clients and/or customers. From the client or customer perspective, however, the ability to navigate a particular website, and the intuitiveness thereof, can vary widely from one website to the next. To aid such navigation, organizations will often provide a “site map,” which effectively provides an index of web pages that can be found in the website. The organization might further break the index down by alphabetical listing or by topic in order to provide the greatest ease of use. This way, if a user has difficulty finding a particular web page of interest using the ordinary menu items provided through the website, the user may be able to find the web page of interest by looking through the corresponding site map.
Unfortunately, site maps can be difficult to generate and maintain for an organization. Often, generation of a site map involves use of some personnel not only to review how various web pages in the website are related, but also to prepare an accurate index page with all of the appropriate, accurate links. The links to various web pages, however, are not particularly static, and so an organization may need to continually review its index page to ensure that the links on the page are fresh and accurate. Such efforts can be particularly important as organizations move more and more to a format that uses automatically generated web pages.
Although some automated mechanisms for generating a site map exist, such mechanisms suffer from a number of difficulties. For example, if a page fails to load properly, or leads to another web page that requires human input before continuing, the system may stop its progression and thus provide an inaccurate or incomplete map. In some cases, the website owner may not even be aware of the incorrect site map, and thus takes the site map at face value.
For similar reasons, these types of errors highlight the inaccuracy website “health” issues. For example, many organizations also now spend considerable resources to “optimize” their websites for maximum discovery and/or use by intended users or customers. Optimization best practices often involve the use of certain page tags, such as metatags and/or “tracking pixels”, in the web page source code, as well as functional code that, when executed, records helpful information about a given web page and how the customer or client uses the web page, such as the web page name, access date, and user actions. In some instances, this information is sent to third-party entities or vendors for various purposes, such as tracking, analytics, advertising, and the like. Conventional mechanisms for determining website health involve merely scanning the web page source code for the presence of expected metatags, tracking pixels, or links to expected executables.
Such mechanisms, however, are prone to providing website owners with an incomplete report about website health, or otherwise indicating that the expected code is present without the added information of whether the code works as intended. For example, simply scanning the text (HTML) source code of a web page does not indicate that the source code (e.g., embedded javascript routines) will execute appropriately, or indicate that the source code meets performance or other requirements. For instance, mere scanning for the presence of expected page tags (metatags, tracking pixels, etc.) might not indicate whether the page tags conform to vendor requirements, or whether functional code, when executed, will generate requests that contain valid parameters and/or parameter values. In addition, scanning the web page source code text may miss dynamic content, i.e. the content of other executable code that are generated by or linked to the web page and stored at (or accessed from) another location.
Accordingly, there are a number of difficulties with website auditing and review that can be addressed.