The internet is a communications network which connects millions of computers around the world. Many organisations host web sites that can be accessed via this communications network. Each web site can contain a variety of web objects, including plain text, hypertext, images, audio, video, and other multimedia information. In particular, hypertext documents written in Hypertext Markup Language (HTML) and similar languages are referred to as web pages. Many web objects—in particular, web pages—contain links to other web objects, including objects on other web sites. The collection of all web sites and the links between them are known as the World-Wide Web (WWW).
Web sites are accessed using a web browser. To access a web object, a user simply needs to input into a browser the “address” of the object, which is specified by a Uniform Resource Identifier (URI), the most common form of which is a Uniform Resource Locator (URL).
A user of a web site often has needs which are different from the needs of the host of the web site. For example, a user of a corporate web site may be looking for an interactive, interesting and efficient way of accessing information, while the web site host may be interested in raising revenue through sales on the web site, imparting particular product information to the user, or increasing brand recognition with the user. Web site designers therefore face the challenge of meeting the objectives of both the web site host and the users. Web site designers often strive to design a site such that a user can enter the site and quickly retrieve the information that they require, and preferably be provided with other information which is related to the retrieved information. If the web site is poorly designed, or suffers from problems of structural quality and integrity, the user may give up and seek an alternative way of finding the information they require.
Designing and managing a corporate web site can be particularly difficult as such sites are typically large and complex. Information contained in web sites may quickly become out of date, and is frequently added to, changed, or removed. Coordinating these changes can be difficult, as many people can be involved in the process, and many managers and technical staff may not fully understand the structure and organisation of their corporate web site. The web site is usually understood as a complex collection of web pages and links. However, the linking structure of a web site is critical, as it directly affects how well a user can make use of the information provided by the organisation.
A well-designed and well-maintained linking structure that is free from errors has the following benefits:                it helps a user to identify the navigation choices available to them at each web page;        it allows a user to quickly find and read relevant information in the web site; and        it allows the organisation to lead a user to information that they would like the user to see, such as an order page or a corporate mission statement.        
Unfortunately, large web sites often have a poorly-designed and poorly-maintained linking structure, and may exhibit any of the following common problems:                broken links (links to objects which no longer exist or have changed URL);        isolated (unreachable) objects;        objects which are out-of-date;        non-returning links (links which lead the user into a dead end);        objects which are hard to reach (a large number of links must be traversed);        important web objects (such as order forms) which are only linked from a few web pages; and        inconsistent linking styles and techniques.        
These problems reduce the effectiveness of the web site and leave the user with a poor impression of the organisation hosting the web site.
An intranet is a web site hosted by an organisation for internal use only, such as for employees. Many organisations have intranet web sites that are much larger than their public web site, as the internal need for information within the organisation may be more urgent than the need to provide information to the public. As a result, the inconsistencies that occur in a typical internet web site are of even more concern within intranet web sites.
Web maps have been developed as a tool to help a user understand the structure of a web site. A web map can represent how a user might navigate between pages in the web site. In this context, web maps are also referred to as navigation maps or access paths.
A web map is a representation of one or more web sites, or parts thereof. A web map can be created by scanning one or more web sites, examining the web objects encountered, and recording the linking structure associated with these objects. Each web object may be represented in a web map by a node, which can be an icon, symbol, shape or text. Often, the node is labelled with the filename, title, or URL of the associated web object. One or more links between a pair of web objects can be represented in a web map by an “edge”, which is drawn as a line between the associated nodes. Typically, an arrowhead is placed on the line to indicate the “direction” of the link from the object containing the link (called the source of the link) to the object referenced by the link (a URL, called the destination of the link). If there are links running in both directions between a pair of web objects, the corresponding edge may have arrowheads on both ends.
Definitions
A web site includes a collection of web objects which may provide information via an internal network (an intranet web site) or to the general public via the internet (an internet web site). A web site may also have facilities for obtaining information from its users. Note that a web site may consist of more than one physical machine. For example, a corporation called Acme may have two machines whose internet addresses are, respectively, www.customers.acme.com and www.products.acme.com, and there may be many links that relate web objects across the two machines. The organisation may consider both machines to constitute a single logical web site.
A web object is any document that can exist on a web site. This includes, but is not limited to, plain text, hypertext, images, audio, video and other multimedia objects, executable applications, and database information. A web object may be a static file or database entry residing on a machine hosting the web site, or it may be dynamically generated by the web site as needed.
A directory structure of a web site is the physical arrangement of web objects on the machine or machines hosting the web site. The term directory structure is used herein not only in the context of web objects stored in traditional file systems, but also for web objects stored in relational databases, object-oriented databases, and any other structured storage of web objects.
A linking structure of a web site is the arrangement of web objects and links which form a web site.
A web map is a representation of the linking structure of one or more web sites or parts thereof.
A “directory distance” between two web objects is given by the length of the shortest path between the two objects in the directory structure of the web site. For example, “www.abc.com” and “ftp.abc.com” would be in the same virtual directory, namely “.abc.com”, and “.com” is a common grandparent directory.
A “link distance” between two web objects is the length of the shortest path between the two objects in the linking structure of the web site.
Throughout this specification, unless the context requires otherwise, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element or integer or group of elements or integers but not the exclusion of any other element or integer or group of elements or integers.