Humans are capable of using the World Wide Web to carry out tasks such as finding the Finnish word for “car”, to reserve a library book, or to search for the cheapest DVD and buy it. However, a computer cannot accomplish the same tasks without human direction because web pages are designed to be read by people, not machines. The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious works involved in finding, sharing and combining information on the web.
For example, a computer might be instructed to list the prices of flat screen HDTVs larger than 40 inches with 1080p resolution at shops in the nearest town that are open until 8 pm on Tuesday evenings. Today, this task requires search engines that are individually tailored to every website being searched. The semantic web provides a common standard for websites to publish the relevant information in a more readily machine-processable and integratable form.
Tim Berners-Lee originally expressed the vision of the semantic web as follows: “I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web—the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize”.
Semantic publishing will benefit greatly from the semantic web. In particular, the semantic web is expected to revolutionize scientific publishing, such as real-time publishing and sharing of experimental data on the Internet. This simple but radical idea is now being explored by W3C HCLS group's Scientific Publishing Task Force.
Tim Berners-Lee has further stated: “People keep asking what Web 3.0 is. I think maybe when you've got an overlay of scalable vector graphics—everything rippling and folding and looking misty—on Web 2.0 and access to a semantic Web integrated across a huge space of data, you'll have access to an unbelievable data resource”.
The semantic web is an evolving extension of the World Wide Web in which web content can be expressed not only in natural language, but also in a form that can be read and used by software agents, thus permitting them to find, share and integrate information more easily. It derives from W3C director Tim Berners-Lee's vision of the Web as a universal medium for data, information, and knowledge exchange.
Currently, navigation across the World Wide Web is primarily via use of locational information, typically via a Universal Resource Locator (URL), Universal Resource Identifier (URI), etc. Thus, for example, some of the previous section of text came from Wikipedia, and had a URL of:
http://en.wikipedia.org/wiki/Semantic_Web
That Wikipedia URL was obtained through use of the Google search engine with a search string consisting of “Semantic Web”. Search engines like Google have electronic spiders climbing the World Wide Web looking for web pages, and when they are found, indexing them. Links to these web pages are then provided search engine users based on proprietary algorithms that take into account the proximity and frequency of the search terms and the frequency that the different web pages containing those search terms are accessed. Notably though, the searching is almost entirely context free. The relationship between search terms means nothing to the search engine, but is rather an attempt by the users to find some set of words that are somewhat uniquely included in the objects that he seeks.
One a search engine has provided a user with a URL or other alphanumeric location identifier, a translation is made between that URL and a computer understandable routing address typically utilizing a Domain Name Server (DNS). On the Internet, the Domain Name System (DNS) associates various sorts of information with so-called domain names; most importantly, it serves as the “phone book” for the Internet: it translates human-readable computer hostnames, e.g. en.wikipedia.org, into the IP addresses that networking equipment needs for delivering information. It also stores other information such as the list of mail exchange servers that accept e-mail for a given domain. In providing a worldwide keyword-based redirection service, DNS is an essential component of contemporary Internet use.
The most basic use of DNS is to translate hostnames to IP addresses. It is in very simple terms like a phone book. For example, if you want to know the internet address of en.wikipedia.org, DNS can be used to tell you it's 66.230.200.100. DNS also has other important uses.
Pre-eminently, the DNS makes it possible to assign Internet destinations to the human organization or concern they represent, independently of the physical routing hierarchy represented by the numerical IP address. Because of this, hyperlinks and Internet contact information can remain the same, whatever the current IP routing arrangements may be, and can take a human-readable form (such as “wikipedia.org”) which is rather easier to remember than an IP address (such as 66.230.200.100). People take advantage of this when they recite meaningful URLs and e-mail addresses without caring how the machine will actually locate them.
The DNS also distributes the responsibility for assigning domain names and mapping them to IP networks by allowing an authoritative server for each domain to keep track of its own changes, avoiding the need for a central registrar to be continually consulted and updated.
The domain name space consists of a tree of domain names. Each node or leaf in the tree has one or more resource records, which hold information associated with the domain name. The tree sub-divides into zones. A zone consists of a collection of connected nodes authoritatively served by an authoritative DNS name server. (Note that a single name server can host several zones.)
When a system administrator wants to let another administrator control a part of the domain name space within his or her zone of authority, he or she can delegate control to the other administrator. This splits a part of the old zone off into a new zone, which comes under the authority of the second administrator's nameservers. The old zone becomes no longer authoritative for what comes under the authority of the new zone.
A resolver looks up the information associated with nodes. A resolver knows how to communicate with name servers by sending DNS requests, and heeding DNS responses. Resolving usually entails iterating through several name servers to find the needed information.
Some resolvers function simplistically and can only communicate with a single name server. These simple resolvers rely on a recursing name server to perform the work of finding information for them.
Users generally do not communicate directly with a DNS resolver. Instead DNS resolution takes place transparently in client applications such as web browsers, mail clients, and other Internet applications. When a request is made which necessitates a DNS lookup, such programs send a resolution request to the local DNS resolver in the operating system which in turn handles the communications required.
The DNS resolver will almost invariably have a cache containing recent lookups. If the cache can provide the answer to the request, the resolver will return the value in the cache to the program that made the request. If the cache does not contain the answer, the resolver will send the request to a designated DNS server or servers. In the case of most home users, the Internet service provider to which the machine connects will usually supply this DNS server: such a user will either have configured that server's address manually or allowed DHCP to set it; however, where systems administrators have configured systems to use their own DNS servers, their DNS resolvers often point to separately maintained name servers of the organization. In any event, the name server thus queried will follow the process outlined above, until it either successfully finds a result or does not. It then returns its results to the DNS resolver; assuming it has found a result, the resolver duly caches that result for future use, and hands the result back to the software which initiated the request.
The system outlined above provides a somewhat simplified scenario. The DNS includes several other functions:                Hostnames and IP addresses do not necessarily match on a one-to-one basis. Many hostnames may correspond to a single IP address: combined with virtual hosting, this allows a single machine to serve many web sites. Alternatively a single hostname may correspond to many IP addresses: this can facilitate fault tolerance and load distribution, and also allows a site to move physical location seamlessly.        
There are many uses of DNS besides translating names to IP addresses. For instance, Mail transfer agents use DNS to find out where to deliver e-mail for a particular address. The domain to mail exchanger mapping provided by MX records accommodates another layer of fault tolerance and load distribution on top of the name to IP address mapping.
Sender Policy Framework and DomainKeys instead of creating own record types were designed to take advantage of another DNS record type, the TXT record.
To provide resilience in the event of computer failure, multiple DNS servers provide coverage of each domain. In particular, more than thirteen root servers exist worldwide. DNS programs or operating systems have the IP addresses of these servers built in.
The DNS uses TCP and UDP on port 53 to serve requests. Almost all DNS queries consist of a single UDP request from the client followed by a single UDP reply from the server. TCP typically comes into play only when the response data size exceeds 512 bytes, or for such tasks as zone transfer. Some operating systems such as HP-UX are known to have resolver implementations that use TCP for all queries, even when UDP would suffice.
The typical result of a DNS lookup is a machine understandable address for one or more machines somewhere in the world. Most often, this is an Internet Protocol (IP) address, typically presented in IP level 4 as four numbers, each number separated from the others by a period (“.”). However, the underlying IP level 4 IP address is actually a thirty-two bit value, with each of the four numbers in the human readable form representing an eight bit number as a decimal integer. Because the 32 bit IP Version 4 address space is insufficient for the expected growth of the WWW, IP Version 6 has been designed to have a 128 bit addressing space. Nevertheless, as far as DNS is concerned, the IP Version 6 addresses are functionally equivalent to IP Version 4 addresses, and are functionally the machine readable addresses used to access a remote system.
IP is a low level protocol utilized to route between two systems over the Internet and within intranets. Logically situated above the IP protocol is typically either the User Datagram Protocol (UDP) or Transmission Control Protocol (TCP). Both TCP and UDP utilize “ports” to communicate between different applications on the various systems. Some of the better known port numbers are 53 for DNS and 80 for the HTTP (i.e. web browsers).
It is the responsibility of these applications to finish the interpretation and routing of URLs and URIs. Thus, in the example given above of: http://en.wikipedia.org/wiki/Semantic_Web, the “http” determines the destination application. This is typically translated into port 80 for “http”. DNS will then generate a TCP address for “en.wikipedia.org”, which is currently 66.230.200.100 (IP V4). Finally, the application listening to port 80 at IP address 66.230.200.100 will interpret the remainder of the URL, in this case: “wiki/Semantic_Web”, to provide the requested information about the semantic web in Wikipedia.
Note that addressing across the Internet today is essentially done absent semantics and context. This was done intentionally, since it provides a very simple, universal, method of accessing content of any imaginable type. Nevertheless, it is proving to be inadequate to the growing complexity of the information on the World Wide Web. The attempts in the past to utilize semantic information to identify and access desired information across the World Wide Web have yet been successful. In particular, a comprehensive integrated solution to the Semantic Web would be highly desirable.