The Internet, which started in the late 1960s, is a vast computer network consisting of many smaller networks that span the entire globe. The Internet has grown exponentially, and millions of worldwide users ranging from individuals to corporations now use permanent and dial-up connections to access the Internet on a daily basis. The computers or networks of computers connected within the Internet, known as “hosts”, allow public access to databases containing information in nearly every field of expertise and are supported by entities ranging from universities and government to many commercial organizations.
The information on the Internet is made available to the public through “servers”. A server is a system running on an Internet host for making available files or documents contained within that host. Such files are typically stored on magnetic storage devices, such as tape drives or fixed disks, local to the host. An Internet server may distribute information to any computer that requests the files on a host. The computer making such a request is known as the “client”, which may be an Internet-connected workstation, bulletin board system or home personal computer (PC).
The World-Wide Web (Web) is a method of accessing information on the Internet which allows a user to navigate the Internet resources intuitively, without IP addresses or other technical knowledge. The Web dispenses with command-line utilities which typically require a user to transmit sets of commands to communicate with an Internet server. Instead, the Web is made up of hundreds of thousands of interconnected “pages”, or documents, which can be displayed on a computer monitor. The Web pages are provided by hosts running special servers. Software which runs these Web servers is relatively simple and is available on a wide range of computer platforms including PC's. Equally available is a form of client software, known as a Web “browser”, which is used to display Web pages as well as traditional non-Web files on the client system. Today, the Internet hosts which provide Web servers are increasing at a rate of more than 300 per month.
Each Web page may contain pictures and sounds in addition to text. Hidden behind certain text, pictures or sounds are connections, known as “hypertext links” (“links”), to other pages within the same server or even on other computers within the Internet. For example, links may be visually displayed as words or phrases that may be underlined or displayed in a second color. Each link is directed to a web page by using a special name called a URL (Uniform Resource Locator). URLs enable a Web browser to go directly to any file held on any Web server. A user may also specify a known URL by writing it directly into the command line on a Web page to jump to another Web page.
A document designed to be accessed and read over the web is called a web page. Each web page must have an address in a recognized format—the URL, or Uniform Resource Locator—that enables computers all over the world to access it. Each web page has an unique URL. A web page typically contains both text and images. Because image files are large, even when compressed, it could take a long time to retrieve a web page, especially when a voice-quality phone line is used to connect to the Internet. Consequently, it is important to design a browser which is able to reduce the amount of time to display a web page.
As previously mentioned, a URL is a Uniform Resource Locator, a standard way developed to specify the location of a resource available electronically. URLs make it possible to direct both people and software applications to a variety of information, available from a number of different Internet protocols. Most commonly, a user will encounter URLs when using a World Wide Web (WWW) client, as that medium uses URLs to link WWW pages together. In a WWW browser's “location” box, the item that generally starts with “http:” is a URL. Files available over protocols besides HTTP, such as FTP and Gopher can be referenced by URLs. Even Telnet sessions to remote hosts on the Internet and someone's Internet e-mail address can be referred to by a URL.
A URL is like similar to a person's complete mailing address: it specifies all the information necessary for someone to address an envelope to you. However, they are much more than that, since URLs can refer to a variety of very different types of resources. A more fitting analogy would be a system for specifying a user's mailing address, the user's telephone number, or the location of a book a user has just read from the public library. All of this information would be in one format. URLs have a very specific syntax and all s follow the format.
The URL naming system consists of three parts: the transfer format, the host name of the machine that holds the file, and the path to the file. An example of a URL may be: “http://” concatenated with “www.college.univ.edu/Adir/Bdir/Cdir/page.html”, where “http” represents the transfer protocol; a colon and two forward slashes (://) are used to separate the transfer format from the host name; “www.college.univ.edu” is the host name in which “www” denotes that the fit being requested is a Web page;“/Adir/Bdir/Cdir” is a set of directory names in a tree structure, or a path, on the host machine; and “page.html” is the file name with an indication that the file written in HTML. In short, a URL is a very convenient and succinct way to direct people and applications to a file or other electronic resource.
The Internet maintains an open structure in which exchanges of information are made cost-free without restriction. The free access format inherent to the Internet, however, presents difficulties for those information providers requiring control over their Internet servers. Consider for example, a research organization that may want to make certain technical information available on its Internet server to a large group of colleagues around the globe, but the information must be kept confidential. Without means for identifying each client, the organization would not be able to provide information on the network on a confidential or preferential basis. In another situation, a company may want to provide highly specific service tips over its Internet server only to customers having service contracts or accounts.
Access control by an Internet server is difficult for at least two reasons. First, when a client sends a request for a file on a remote Internet server, that message is routed or relayed by a web of computers connected through the Internet until it reaches its destination host. The client does not necessarily know how its message reaches the server. At the same time, the server makes responses without ever knowing exactly who the client is or what its IP address is. While the server may be programmed to trace its clients, the task of tracing is often difficult, if not impossible. Secondly, to prevent unwanted intrusion into private local area networks (LAN), system administrators implement various data-flow control mechanisms, such as the Internet “firewalls”, within their networks. An Internet firewall allows a user to reach the Internet anonymously while preventing intruders of the outside world from accessing the user's LAN.
For various historical and technical reasons, inter net URLs are designed in such a way that the directory structure of your web server is exposed to the outside world. This exposure allows hackers an inside look at your system. In many cases on a public web server you do not want this type of information exposed as it opens the door to possible intrusion.
It's considerably harder to configure servers to permit users to access appropriate resources without exposing the web-server to possible intrusions. Using an alias system can become very cumbersome and difficult to manage. Alternatively, trying to obscure information by creating a complicated layout of the files system can add to the problems of system management and content updates.
There are known weaknesses with many operating systems that allow shell meta-characters (i, & etc.,) to break the application, thereby exposing the web-server. It may sound far-fetched but it happens most of the times. Some of the conventional methods for transferring information via a computing network are defined as:
Get Method: Normally a URL string is used to pass information (parameters) from page to page. On the receiving end, the URL string is parsed and appropriate actions are taken. One way to pass the parameters is by method=get, with this method, the parameters are displayed in the URL e.g., “http//:” concatenated with “www.ibm.com/gold/ParseForm?username=abc&password=abc 123”,
Post Method: Kind of hides. You can prevent sensitive information from being displayed in the URL by using method=instead of method=get. But to be precise Post does not hide information in the URL, it sends the information in the HTTP headers apart from the URL. It's still possible to intercept the headers.
SSL: In the case of Post method, the Post data is hidden from external users, but the action portion of the form is still visible. In the users browser, the POST data is available by viewing the source of the web page. In the case of the GET method, nothing is hidden but the contents of the web resources, i.e. the URL remains visible to all. While post and SSL hide information passed from page to page, they actually do-not encrypt the URL exposing the structure of the web-server.
Redirecting: Another possible solution would be to redirect to the requested resource. Again, while this could be useful, it would still allow users to view the URL.
Aliasing: Aliasing URL's to point to different directories can be useful in obscuring some amount of information about your web server, but aliasing only works to a point. Sub-directories would still be visible at lower levels and directory structures would still be exposed to external users. Aliasing also requires a large amount of manual configuration work. It is not easily reconfigured and additions are also time consuming and not dynamic.
Although these methods can adequately transmit information via a computing network, none these methods can adequately protect unauthorized access to messages involved in the transmission. In addition, even though network security schemes exist that can protect access to the contents of messages; it is believed that no current system attempts to protect access to the Universal Resource Locators (URLs) that are part of each message transmission. Access to the URLs can enable unauthorized persons to learn important information about the directory structures of resources on a network. This information could enable one to cause substantial harm to resources connected to the network and ultimately to the entire network. Therefore, there remains a need for a solution that provides a dynamic, easily configurable system, which can be used to encrypt or otherwise hide the internal structure of your web server. URLs can be changed without changes to the web server or to the file system of the web server.