The present invention relates generally to the World Wide Web, and more particularly to web browsers or other such programs.
The World Wide Web (“WWW”) is well known today. Users of client computers with web browsers request web pages by specifying a URL, either by typing the URL into an address field or selecting a link for a URL. Typically, the user requests the web page using HTTP. Web pages are often a user interface to an application hosted by a server and contain information, such as product information, related to the application. Such web pages often include links to access other web pages or invoke services of the application. Other web pages are merely informational, and do not provide an operational interface to any application.
In response to a user request for a web page (where the user request specifies a URL), the user's web browser obtains from a domain name server (“DNS”) an IP address of a server and application on the server represented by the URL. Then the web browser forwards the client request for the web page to the (IP address of the) server/application. In response, the requested application on the server returns the requested web page, and a return code of “2xx” indicating that the requested web page was successfully located and returned.
Occasionally, the server or application addressed by the user-specified URL redirects the user request to another server and/or application which returns a “redirected” web page (not the one represented by the user-specified URL). For example, a user may request a web page for a presumed URL based on a generic product name, and a proxy server listed in a domain name server as responsible for that URL will redirect that request to another server and application which provides information or services relating to the generic product. As another example, a vulnerability scanning tool can simulate requests made by a person outside of a firewall or an enterprise being tested to determine if the person can access sensitive web pages from an application within the firewall or enterprise being tested. If so, this represents a security “hole”. If not, the server and application addressed by the tool will redirect the request to a default URL which returns a “Sorry, Page Not Found” web page and a return code other than “2xx” to indicate that the returned page was not the one requested. As another example, if a hyperlink requested by a user is not functioning, the server executing the connection may redirect the user to a web page indicating a request was not successful. As another example, a load balancer or network dispatcher which receives a user request for a URL that does not exist or is not accessible may redirect the user request to a default “page not found” web page.
When a server returns a web page other than the one requested by the user, i.e. other than the one represented by the user-specified URL, the server typically provides with the web page a return code other then a “2xx”. An RFC2068 industry standard defines five classifications for HTTP return codes: A “1xx” return code means that the server to which the request was sent is processing the request. A “2xx” return code means that the request was successfully received, understood and accepted. A “3xx” return code means that the request was redirected, and a user must take further action to complete the request such as to wait or to select another link on a redirection web page. A “4xx” return code means a client error such as bad syntax, and cannot be fulfilled. A “5xx” return code means a server error such that the server failed to fulfill an apparently valid request. However, the application which returns the redirected web page may be programmed to return a different return code for a variety of reasons. For example, the application which returns the redirected web page with the misleading return code may want to abstract or conceal the fact that the user request was redirected or may have a valid security reason for concealing the redirection. In many cases, the redirected web page does not indicate anywhere else in the web page that the web page is redirected. In other cases, the application which returns the redirected web page may be programmed to return a web page such as illustrated in FIG. 1 which does not include the proper “404” return code, but clearly states in text that the requested web page was not found. Even though this web page indicates in text that the web page was not found, if the requester is a program tool looking for a return code, then the program tool will not recognize this web page as redirected.
There are various situations where the user needs to know that the web page which was returned was not the one requested. In some of these cases the “user” is a person, and in other cases the “user” is a program executing on the user's workstation. For example, consider when the foregoing vulnerability scanning tool simulates requests made by a person, and the application addressed by the tool recognizes the tool as not authorized to access the web page. In such a case, instead of returning the sensitive web page, the addressed application redirects the request to a default application which returns a “Sorry, Page Not Found” web page. If the default application does not return the industry standard “404” code (representing “page not found”), the vulnerability scanning tool may interpret the redirected web page as the one requested by the tool, and conclude that the sensitive web page was returned and a “hole” exists in the security system.
FIG. 2 illustrates another type of known redirected web page. The user-requested application has redirected the user request to a known “time-out” application (because the user's connection to the user-requested application has timed-out). The time-out application returns a redirected web page requesting the user to logon again, instead of the web page to interface to the requested application. Even though this is a redirected web page, the “time-out” application included a “252” (non-redirected) status return code in the URL field. Consequently, if the “user” is the foregoing vulnerability scanning tool, the tool may interpret this web page as the one originally requested by the tool, calculate that the web page was returned after the expiration of the time-out period, and conclude that a “hole” exists in the security system.
An object of the present invention is to enable a web browser or associated program to better detect when a user request for a web page has been redirected.