An internet is a group of networks and individual computers communicating via a common protocol. The Internet is a world-wide "network of networks" that use the TCP/IP (Transmission Control Protocol/Internet Protocol) protocol suite for communications. TCP/IP is a set of protocols and programs used to interconnect computer networks and to route traffic among different types of computers. These protocols describe allowable data formats, error handling, message passing, and communication standards. Computer systems that use TCP/IP speak a common language, regardless of hardware or operating system differences.
Part of the popularity of the TCP/IP protocol suite is due to its ability to be implemented on top of a variety of communications channels and lower-level protocols such as T1 and X.25, Ethernet, and RS-232-controlled serial lines. Most sites use Ethernet connections at local area networks to connect hosts and client systems, and then connect that network via a T1 line to a regional network (i.e., a regional TCP/IP backbone) that connects to other organizational networks and backbones. Sites customarily have one connection to the Internet, but large sites often have two or more connections. Modem speeds are increasing as new communications standards are being approved, thus versions of TCP/IP that operate over the switched telephone network are becoming more popular. Many sites and individuals use PPP (Point-to-Point Protocol) and SLIP (Serial Line IP), to connect networks and workstations to other networks using the switched telephone network.
Many large networks conform to these protocols, including the Internet. Thousands of computers at universities, government agencies, and corporations are connected to a network that follows the TCP/IP protocols. Any machine on the Internet can communicate with any other. Machines on the Internet are referred to as hosts or nodes and are defined by their Internet (or IP) address.
The Internet was created initially to help foster communication among government-sponsored researchers. Throughout the 1980's, the Internet grew steadily to include educational institutions, government agencies, commercial organizations, and international organizations. In the 1990's, the Internet has undergone phenomenal growth, with connections increasing faster than any other network ever created (including the telephone network). Many millions of users are now connected to the Internet, with roughly half being business users. The Internet is being used as the basis for the National Information Infrastructure (NII).
Many organizations are in the process of connecting to the Internet to take advantage of Internet services and resources. Businesses and agencies are now using the Internet or considering Internet access for a variety of purposes, including exchanging e-mail, distributing agency information to the public, and conducting research. Many organizations are connecting their existing internal local area networks to the Internet so that local area network workstations can have direct access to Internet services.
Internet connectivity can offer enormous advantages, however security needs to be a major consideration when planning an Internet connection. There are significant security risks associated with the Internet that often are not obvious to new (and existing) users. In particular, intruder activity, as well as vulnerabilities that could assist intruder activity, are widespread. Intruder activity is difficult to predict and at times can be difficult to discover and correct. Many organizations already have lost productive time and money in dealing with intruder activity; some organizations have had their reputations suffer as a result of intruder activity at their sites being publicized.
A firewall system is one technique that has proven highly effective for improving the overall level of site security. A firewall system is a collection of systems and routers logically placed at a site's central connection to a network. A firewall system can be a router, a personal computer, a host, or a collection of hosts, set up specifically to shield a site or intranet from protocols and services that can be abused from hosts outside the intranet. A firewall system is usually located at a higher-level gateway, such as a site's connection to the Internet; however, firewall systems can be located at lower-level gateways to provide protection for some smaller collection of hosts or intranets. A firewall forces all network connections to pass through the gateway, where they can be examined and evaluated, and provides other services such as advanced authentication measures to replace simple passwords. The firewall may then restrict access to or from selected systems, or block certain services, or provide other security features. The main purpose of a firewall system is to control access to or from a protected network (i.e., a site). It implements a network access policy by forcing connections to pass through the firewall, where they can be examined and evaluated.
The general reasoning behind firewall usage is that, without a firewall, an intranet's systems expose themselves to inherently insecure services and to probes and attacks from hosts elsewhere on the network. A firewall provides numerous advantages to sites by helping to increase protection from vulnerable services, controlled access to site systems, concentrated security, enhanced privacy, logging and statistics on network use, and misuse policy enforcement. In a firewall-less environment, network security relies totally on host security and all hosts must, in a sense, cooperate to achieve a uniformly high level of security. The larger the intranet, the less manageable it is to maintain all hosts at the same level of security. As mistakes and lapses in security become more common, break-ins occur, not as the result of complex attacks, but because of simple errors in configuration and inadequate passwords.
A firewall provides the means for implementing and enforcing a network access policy. In effect, a firewall provides access control to users and services. Thus, a network access policy can be enforced by a firewall. One problem is that Intranets that have a significant number of clients deployed often overwhelm the throughput capacity of the firewall.
A proxy server in general is a process that provides a cache of items available on other servers which are presumably slower or more expensive to access.
More specifically, a caching proxy server is used for a World-Wide Web server which accepts uniform resource locators (URLs) with a particular prefix. When it receives a request for such a URL, it strips off the prefix and looks for the resulting URL in its local cache. If found, it returns the document immediately, otherwise it fetches it from the remote server on the Internet, saves a copy in the cache and returns it to the requester. The cache will usually have an expiry algorithm which flushes documents according to their age, size, and access history. Caching proxy servers are often implemented to alleviate the problem of firewalls or proxy gateway servers that are overwhelmed by requests.
In comparison, a proxy gateway server is a computer and associated software which will pass on a request for a URL from a Internet browser to an outside server and return the results. This provides clients on Intranets a trusted agent that can access the Internet on their behalf. The proxy gateway is transparent to the client. A proxy gateway is a server that simply forwards requests from clients or other proxies on to another server or proxy. The second server or proxy sees the original proxy as just another HTTP client. When the proxy receives a response to the forwarded request, it simply returns it to the client. An Internet gateway proxy services HTTP requests by translating them into protocols other than HTTP. The reply sent from the remote server to the gateway is likewise translated into HTTP before being forwarded to the user agent.
In HTTP messaging, proxies use the header of the request to indicate the intermediate steps between the user agent and the server (on requests) and between the origin server and the client (on responses). The header is intended to trace transport problems and to avoid request loops.
Furthermore, agents can be implemented on clients or servers which are connected to the Internet. In the client-server model, an agent is the part of the system that performs information preparation and exchange on behalf of a client or server. Especially in the phrase "intelligent agent," it implies an automatic process which can communicate with other agents to perform some collective task on behalf of one or more humans. The term agent is used to describe an automatic computer process that performs an action, such as information preparation or exchange, with no human intervention involved. Examples of Internet agents are brokers, wanderers, spiders, worms and viruses. Agents can facilitate work and coordinate tasks among machines and other agents.
Data is transmitted across the Internet in packets. A packet is a logical grouping of information that includes a header, control information, and a body that usually contains user data. A message may be segmented into a number of packets. When sent from one user to another via the Internet, individual packets may go by different routes and the packets are reassembled into the original message before reaching the destination. This contrasts with circuit switching in which the two users are actually connected by an end-to-end circuit, telephone network. Packet is the common name for a layer 3 PDU (Physical Data Unit: A unit of information at any given level of the 7-layer OSI protocol stack. Layer 3 PDUs are often called packets while layer 3 PDUs are often called frames) IP datagrams are often called packets.
Packets can be intercepted at any point within a packet-based network unless special security measures are in place. Capturing packets in this manner is known as network "snooping," "packet sniffing," and "promiscuous monitoring." Sniffers are either hardware or software devices that can intercept and capture electronic messages not addressed to it, but rather addressed to another address on a network. Sniffers are a network monitoring tool that can capture data packets and decode them to show protocol data.
Sniffers typically have the capability of capturing every packet on a network and of decoding all seven layers of the OSI protocol model; the physical layer, the datalink layer, the network layer, the transport layer, the session layer, the presentation layer, the application layer. Capture frame selection can be based on several different filters such as protocol content at lower levels, node addresses, destination class, and pattern matching.
Network sniffers typically display network traffic information and performance statistics in real time, and in user-selectable formats. Numeric station addresses are translated to symbolic names or manufacturer ID names. Network activities measured include buffer use, frames accepted, and Kbytes accepted. Counters for activities specific to particular networks may be implemented. Network activity is expressed as frames/second, Kbytes/ second, or percentage of network bandwidth utilization. Data collection by a sniffer may be output to printer or stored to disk in either print-file or spread-sheet format.
In addition to the problem of firewalls overwhelmed by requests, distribution of internal communication via an organization's Intranet is often cumbersome and inconvenient to the users of client computers and the administrators of the Intranets.
The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. HTTP has been in use by the World-Wide Web since 1990. An HTTP client submits requests to an HTTP server. The server responds by returning a response code and any appropriate data indicated by the original request. HTTP allows an open-ended set of methods that indicate the purpose of a request. A method indicates the operation that the client requests the server to perform. HTTP builds on the discipline of reference provided by the Uniform Resource Identifier (URI), as a location (URL) or name (URN), for indicating the resource to which a method is to be applied. Messages are passed in a format similar to that used by Internet mail as defined by the Multipurpose Internet Mail Extensions (MIME). HTTP is also used as a generic protocol for communication between user agents and proxies/gateways to other Internet systems, including those supported by the SMTP, NNTP, FTP, Gopher, and WAIS protocols. In this way, HTTP allows basic hypermedia access to resources available from diverse applications.
While the set of HTTP methods is open-ended, the HTTP methods GET and HEAD must be supported by all servers that are HTTP-compliant. All other methods are optional. The GET method retrieves information identified by the Request-URI.
The Request-URI is a Uniform Resource Identifier and identifies the resource upon which to apply the request. EQU Request-URI="*".vertline.absolute URI.vertline.abs_path
The three options for Request-URI are dependent on the nature of the request.
(1) The asterisk "*" means that the request does not apply to a particular resource, but to the server itself, and is only allowed when the method used does not necessarily apply to a resource. One example would be EQU OPTIONS*HTTP/1.1 PA1 (2) The absolute URI form is required when the request is being made to a proxy. The proxy is requested to forward the request or service it from a valid cache, and return the response. Note that the proxy may forward the request on to another proxy or directly to the server specified by the absolute URI. In order to avoid request loops, a proxy must be able to recognize all of its server names, including any aliases, local variations, and the numeric IP address. An example Request would be: EQU GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.1 PA1 (3) The absolute path ("abs_path") of the URI must be transmitted where a resource on an origin server or gateway is identified.
All HTTP-compliant servers must accept the absolute URI form in requests, even though HTTP-compliant clients will only generate them in requests to proxies.
The semantics of the GET method change to a "conditional GET" if the request message includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range header field as described by IETF publication at http://ds.internic.net/rfc/rfc2068.txt. A conditional GET method requests that data be transferred only under the circumstances described by the conditional header field(s). The conditional GET method is intended to reduce unnecessary network usage by allowing cached entities to be refreshed without requiring multiple requests or transferring data already held by the client.
The above method of specifying GETs is problematic in the complexity of the variations of the GET method.