Search has become a common way of finding information stored on the Internet, on a user's computer system, or on other storage resources (e.g., databases, file systems, and so forth). A common user interface for search tools includes a text control in which a user enters a search query string (e.g., “strawberry festival”) and a button for initiating the search. The search tool then uses a previously created index (e.g., created by crawling the web or indexing files on the user's computer system) to match terms or phrases in the query string with words stored in the index. More advanced search tools may map text in the user's query string to other text, such as other forms of words (e.g., “running” vs. “ran”) and synonyms (e.g., “stocks” vs. “equities”), and identify documents or text that match in the index. The search tools then provide the user with a matching list of search results, which may include documents, links to web pages, or other data sources with contents that match the query string in some way.
Most search engines receive user input in the form of keystrokes. This assumes a degree of knowledge and typing skills. For novice computer searchers, the lack of keyboarding skills will make searching more difficult and, at times, more frustrating. In addition, individuals with physical or mental difficulties may find keyboarding even more difficult. Lastly, individuals with limited vocabulary also will face great difficulties in making successful searches. With modern search tools, you simply cannot search for something if you do not know the words to describe it. Moreover, even if you know the right words in English, you may not find other language resources that may be relevant, such as Chinese documents on the subject topic. Thus, knowledge stays partitioned by language barriers.
The current method of search involves matching digital content to a searcher's entered search terms. Search engines, such as Google, have indexed billions of web sites. These indexes include information gathered from URLs, Hypertext Markup Language (HTML) title information, HTML Meta Tags, image names, accessibility tags, and the content itself of web pages. Meta tags are terms that a webmaster embeds in each document at the head section of the HTML for a given page. There is no standardization for Meta Tags; instead, webmasters make creative (and sometimes manipulative) Meta Tags to gain higher positioning in the search results. This practice (in part) has become known as Search Engine Optimization (SEO). In an effort to balance the result positioning, search engine companies constantly modify their algorithms to counteract false signals. Another approach to get higher positioning is to embed the body of documents with popular search terms, even though the terms may be unrelated to the meaning of the document. In all, webmasters make every effort to get their client's search result position higher, since this will increase the traffic and thus add to the value of the web site.
The World Wide Web Consortium (W3C) for many years has proposed the adoption of semantic tagging to define the subject of web content with a goal of improving the quality/accuracy of search results. These semantic tags are intended to be “machine readable,” such as by web crawlers. To accomplish this feat, these semantic tags are expected to conform to the structure of semantic tagging fundamentals. For example, the semantic tags must be located within a structure that “tells” the computer that it is a semantic tag and that it applies to a particular ontology, and then, the tags must appropriately define the meaning of the referenced information.
Efforts towards creating the Semantic Web strive to improve the quality of the results to more closely match the searcher's intention rather than merely matching the searcher's search terms. There are many different approaches being developed today with each having a strong bias towards their own approach as they incorporate an ontology of terms to define the meaning of particular information. This is similar to the meta tags of the current web, referenced above. The main difference is the effort to standardize the terms used to describe the semantic value of the content.
The creators of ontologies will have their own bias or subjectivity and thus will produce an ontology that may, or may not, be universally accepted. As the field expands, there will soon be countless ontologies, making it more difficult to determine which is the best ontology for each domain. Predictably, each ontology will have its own limitations.
Unfortunately, current search tools have several drawbacks that make them unsuitable for some tasks. For example, the search process described above presupposes that the user knows what the user is searching for, or at least some terms included in documents in which the user is interested. Because of this assumption, search tools are not well suited to discovering new information, even within topics the user can identify. For example, a user may be interested in astronomy and may have an easy time searching for discoveries and information already well known to the user, but may have a much harder time finding sources of new discoveries and information. In some cases, a user may not even know the vocabulary that is common to a field, making keyword-based searching practically useless. For example, a user may want to identify information in a language other than the user's native language or in an unfamiliar field of study that uses specialized terminology (e.g., medicine or law).
In addition, current search tools provide a user interface that assumes that text entry is easy and convenient for the user. This is frequently not the case, particularly in mobile applications (e.g., mobile phones) that are becoming a more and more common source through which users access information. Moreover, current search tools are poor at disambiguation of terms. For example, a search for “cranberries” may refer to the fruit, a color of sweater, or the musical group “The Cranberries.”
Searches using current methods such as Google and even internal corporate search tools will receive thousands, if not millions of results. Many, if not most of these results are not the least bit related to the searcher's objective. The reason for this is that the current method focuses on the combination of user-typed keystrokes. The fact that these keystrokes are found in a particular document only suggests that there is a match. The frequency of such matches, or the proximity, of these search terms and the document only strengthens the search ranking as an indication that the result might be an appropriate match. However, this approach totally fails to zero in on the true intention of the searcher and the semantic meaning of the searcher's particular intended search effort.
The newest approach for Semantic Search is also failing. First, in order for this approach to work, the webmasters must include the semantic ontology to every document on the web. Second, the ontologies will have to be agreed to universally. Third, most web content is not maintained and the sheer numbers of documents makes this extra effort impossible to implement universally. Therefore, the vast corpus of documents will be out-of-scope for the current semantic search approach. The effort to update billions of pages of information is a daunting obstacle to implementing the current vision for the Semantic Web.