Network Architecture
Client-Server
Client-server is a computing architecture which separates a client from a server, and is almost always implemented over a computer network. Each client or server connected to a network can also be referred to as a node. The most basic type of client-server architecture employs only two types of nodes: clients and servers. This type of architecture is sometimes referred to as two-tier. It allows devices to share files and resources.
Each instance of the client software can send data requests to one or more connected servers. In turn, the servers can accept these requests, process them, and return the requested information to the client.
Peer-to-Peer
A peer-to-peer (P2P) computer network exploits diverse connectivity between participants in a network and the cumulative bandwidth of network participants rather than conventional centralized resources where a relatively low number of servers provide the core value to a service or application. Peer-to-peer networks are typically used for connecting nodes via largely ad hoc connections.
A pure peer-to-peer network does not have the notion of clients or servers, but only equal peer nodes that simultaneously function as both “clients” and “servers” to the other nodes on the network. This model of network arrangement differs from the client-server model where communication is usually to and from a central server. A typical example for a non peer-to-peer file transfer is an FTP server where the client and server programs are quite distinct, and the clients initiate the download/uploads and the servers react to and satisfy these requests.
P2P File Transfer
First P2P-Generation: Server-Client
The first generation of peer-to-peer file sharing networks had a centralized file list. In the centralized peer-to-peer model, a user would send a search to the centralized server of what they were looking for. The server then sends back a list of peers that have the data and facilitates the connection and download.
The first file-sharing programs marked themselves by inquiries to a server, either the data to the download held ready or in appropriate different Peers and so-called Nodes further-obtained, so that one could download there. Two examples were Napster (today using a pay system) and eDonkey2000 in the server version.
Second P2P-Generation: Decentralization
Justin Frankel of Nullsoft set out to create a network without a central index server, and Gnutella was the result. Unfortunately, the Gnutella model of all nodes being equal quickly died from bottlenecks as the network grew from incoming Napster refugees. FastTrack solved this problem by having some nodes be ‘more equal than others’.
By electing some higher-capacity nodes to be indexing nodes, with lower capacity nodes branching off from them, FastTrack allowed for a network that could scale to a much larger size.
Gnutella quickly adopted this model, and most current peer-to-peer networks implement this design, as it allows for large and efficient networks without central servers.
Also included in the second generation are distributed hash tables (DHTs), which help solve the scalability problem by electing various nodes to index certain hashes (which are used to identify files), allowing for fast and efficient searching for any instances of a file on the network. This is not without drawbacks; perhaps most significantly, DHTs do not directly support keyword searching (as opposed to exact-match searching).
The best examples are Gnutella, Kazaa or eMule with Kademlia, whereby Kazaa has still a central server for logging in.
Third P2P-Generation: Indirect and Encrypted
The third generation of peer-to-peer networks has anonymity features built in. Examples of anonymous networks are ANts P2P, RShare, Freenet, I2P, GNUnet and Entropy.
A degree of anonymity is realized by routing traffic through other users' clients, which have the function of network nodes. This makes it harder for someone to identify who is downloading or who is offering files. Most of these programs also have strong encryption to resist traffic sniffing.
Friend-to-friend networks only allow already-known users (also known as “friends”) to connect to the user's computer then each node can forward requests and files anonymously between its own “friends” nodes.
Third-generation networks have not reached mass usage for file sharing because most current implementations incur too much overhead in their anonymity features, making them slow or hard to use.
BitTorrent
BitTorrent is a peer-to-peer file sharing (P2P) communications protocol. BitTorrent is a method of distributing large amounts of data widely without the original distributor incurring the entire costs of hardware, hosting and bandwidth resources. Instead, when data is distributed using the BitTorrent protocol, each recipient supplies pieces of the data to newer recipients, reducing the cost and burden on any given individual source, providing redundancy against system problems, and reducing dependence on the original distributor.
A BitTorrent client is any program that implements the BitTorrent protocol. Each client is capable of preparing, requesting, and transmitting any type of computer file over a network, using the protocol. A peer is any computer running an instance of a client.
To share a file or group of files, a peer first creates a “torrent.” This small file contains metadata about the files to be shared and about the tracker, the computer that coordinates the file distribution. Peers that want to download the file first obtain a torrent file for it, and connect to the specified tracker, which tells them from which other peers to download the pieces of the file.
Though both ultimately transfer files over a network, a BitTorrent download differs from a classic full-file HTTP request in several fundamental ways:
First, BitTorrent makes many small P2P requests over different TCP sockets, while web-browsers typically make a single HTTP GET request over a single TCP socket. Second, BitTorrent downloads in a random or in a “rarest-first” approach that ensures high availability, while HTTP downloads in a sequential manner.
Taken together, these differences allow BitTorrent to achieve much lower cost, much higher redundancy, and much greater resistance to abuse or to “flash crowds” than a regular HTTP server. However, this protection comes at a cost: downloads can take time to rise to full speed because it may take time for enough peer connections to be established, and it takes time for a node to receive sufficient data to become an effective uploader. As such, a typical BitTorrent download will gradually rise to very high speeds, and then slowly fall back down toward the end of the download. This contrasts with an HTTP server that, while more vulnerable to overload and abuse, rises to full speed very quickly and maintains this speed throughout.
File Transfer Protocols
File Transfer Protocol (FTP)
FTP is used to transfer data from one computer to another over the Internet, or through a network. Specifically, FTP is a commonly used protocol for exchanging files over any network that supports the TCP/IP protocol (such as the Internet or an intranet). There are two computers involved in an FTP transfer: a server and a client. The FTP server, running FTP server software, listens on the network for connection requests from other computers. The client computer, running FTP client software, initiates a connection to the server. Once connected, the client can do a number of file manipulation operations such as uploading files to the server, download files from the server, rename or delete files on the server and so on.
User Datagram Protocol (UDP)
UDP is one of the core protocols of the Internet protocol suite. Using UDP, programs on networked computers can send short messages sometimes known as datagrams (using Datagram Sockets) to one another. UDP is sometimes called the Universal Datagram Protocol.
UDP does not guarantee reliability or ordering in the way that TCP does. Datagrams may arrive out of order, appear duplicated, or go missing without notice. Avoiding the overhead of checking whether every packet actually arrived makes UDP faster and more efficient, at least for applications that do not need guaranteed delivery.
Hypertext Transfer Protocol (HTTP)
HTTP is a communications protocol used to transfer or convey information on intranets and the World Wide Web. HTTP is a request/response protocol between a client and a server. The client making an HTTP request—such as a web browser, spider, or other end-user tool—is referred to as the user agent. The responding server—which stores or creates resources such as HTML files and images—is called the origin server. In between the user agent and origin server may be several intermediaries, such as proxies, gateways, and tunnels. HTTP is not constrained to using TCP/IP and its supporting layers, although this is its most popular application on the Internet.
Typically, an HTTP client initiates a request by establishing a Transmission Control Protocol (TCP) connection to a particular port on a host. An HTTP server listening on that port waits for the client to send a request message.
Upon receiving the request, the server sends back a status line, such as “HTTP/1.1 200 OK”, and a message of its own, the body of which is perhaps the requested file, an error message, or some other information.
Resources to be accessed by HTTP are identified using Uniform Resource Identifiers (URIs) (or, more specifically, Uniform Resource Locators (URLs)) using the http: or https URI schemes.
CURIE (a compact URI) is an abbreviated URI expressed in CURIE syntax, and may be found in both XML and non-XML grammars. An example CURIE is “[curl:EA83BZ99]” excluding the quotation marks.
Network Address Transversal (NAT)
NAT devices allow internal networks to communicate with external networks using a limited number of external IP Addresses by changing the source address of outgoing requests and listening for replies. This leaves the internal network ill-suited to act as a server, as the NAT device has no way of determining the internal host for which incoming packets are destined. On the Internet, this problem has not generally been relevant to home users behind NAT devices, as they either do not need to act as servers or can use static NAT mappings to correlate incoming requests to internal hosts. However, applications such as P2P file sharing (such as BitTorrent or Gnutella clients) or VoIP networks (such as Skype) require clients to act like servers, thereby posing a problem for users behind NAT devices, as incoming requests cannot be correlated to the proper internal host.
A possible solution to this problem is to use NAT traversal techniques using protocols such as STUN (Simple Traversal of UDP) or ICE (Interactive Connectivity Establishment) or proprietary approaches in a session border controller. NAT traversal is possible in both TCP- and UDP-based applications, but the UDP-based technique is simpler, more widely understood, and more compatible with legacy NATs. In either case, the high level protocol must be designed with NAT traversal in mind, and it does not work reliably across symmetric NATs or other poorly-behaved legacy NATs.
Instant Messaging
Instant messaging (IM) is a form of real-time communication between two or more people based on typed text. The text is conveyed via computers connected over a network such as the Internet. Files may also be transferred to users via an IM client.
IM is built around the concept of real-time, synchronous messaging. For example, I send a message intended for you right now.
Social Graph
A Social Network or Social Graph (e.g. Facebook) is a social structure made of nodes (which are generally individuals or organizations) that are tied by one or more specific types of interdependency, such as values, visions, idea, financial exchange, friends, kinship, dislike, conflict, trade, web links, sexual relations, disease transmission (epidemiology), or airline routes.
Really Simple Syndication (RSS)
RSS is a family of Web feed formats used to publish frequently updated content such as blog entries, news headlines or podcasts. An RSS document, which is called a “feed”, “web feed”, or “channel”, contains either a summary of content from an associated web site or the full text. RSS makes it possible for people to keep up with their favorite web sites in an automated manner that's easier than checking them manually.
Media RSS (MRSS) is an RSS module used for syndicating multimedia files (audio, video, and image) in RSS feeds. It was designed in 2004 by Yahoo! and the Media RSS community, and adds several enhancements to RSS enclosures.
Presence
In computer and telecommunications networks, presence information is a status indicator that conveys ability and willingness of a potential communication partner—for example a user to communicate. A user's client provides presence information (presence state) via a network connection to a presence service, which is stored in what constitutes his personal availability record (called a presentity) and can be made available for distribution to other users (called watchers) to convey his availability for communication. Presence information has wide application in many communication services and is one of the innovations driving the popularity of instant messaging or recent implementations of voice over IP clients.
Universal Plug and Play
Universal Plug and Play (UPnP) is a set of computer network protocols promulgated by the UPnP Forum. The goals of UPnP are to allow devices to connect seamlessly and to simplify the implementation of networks in the home (data sharing, communications, and entertainment) and corporate environments. UPnP achieves this by defining and publishing UPnP device control protocols built upon open, Internet-based communication standards.
The term UPnP is derived from Plug-and-play, a technology for dynamically attaching devices to a computer directly. UPnP enables communication between any two devices under the command of any control device on the network (LAN).
The UPnP architecture supports zero-configuration, “invisible networking” and automatic discovery for many device categories from a range of vendors; any device can dynamically join a network, obtain an IP address, announce its name, convey its capabilities upon request, and learn about the presence and capabilities of other devices. DHCP and DNS servers are optional and are only used if they are available on the network. Devices can leave the network automatically without leaving any unwanted state information behind.
The foundation for UPnP networking is IP addressing. Each device must have a Dynamic Host Configuration Protocol (DHCP) client and search for a DHCP server when the device is first connected to the network. If no DHCP server is available, that is, the network is unmanaged; the device must assign itself an address. If during the DHCP transaction, the device obtains a domain name, for example, through a DNS server or via DNS forwarding, the device should use that name in subsequent network operations; otherwise, the device should use its IP address.
Broadcatching
Broadcatching is the downloading of digital content that has been made available over the Internet using RSS syndication.
The general idea is to use an automated mechanism to aggregate various web feeds and download content for viewing or presentation purposes.
There remains a need for efficient peer-to-peer file sharing.