Traditional online search engines or information retrieval techniques focus on supporting general queries specified typically by a set of keywords. The objectives of the queries define specific information that satisfies that general query. The documents themselves are often indexed via keywords or collections of keywords requiring simple Boolean searching of the keywords entered by the user. These techniques form the basis of modern highly scalable Internet search engines.
The effectiveness of information retrieval is measured on such search engine in terms of both recall and precision effectiveness. Recall is a measure of ratio of relevant documents by precision to non-relevant documents, and is determined as a measure of the relevant documents retrieved within the set returned by the search engine. Search engines based on plain text use statistical models of association of words and relevance within documents and within document collections, but do not necessarily make use of the implicit semantic structure within a document.
Known strategies to increase effectiveness of existing search engines is to employ the use of parametric searching, which typically is used as an interface to a database query. Here, recall and precision can be significantly increased compared to plain text retrieval techniques. However, a problem with parametric searching is that the search page needs to be typically programmed per domain. For example, parametric searching is commonly used in car part search engines, whereby parameters such as manufacturer, make, model, year can be specified and allow the user to identify various keywords which can be searched by means of conventional Boolean searching. Additionally, as the schema of information changes within such domains these changes need to be propagated to the user interface, which to date, results in parametric searching being typically limited to use within a single domain.
Overall, current software search engines for either local system or Internet use are highly limited in their applications and functionality. As the number of available documents and the access to information continues to increase through faster and more powerful computers, search engines are required more accurately to sift through such information to pinpoint more accurately material that the user actually requires.
An important aspect of the present invention is the appreciation by the applicant that the use of document structure and semantics to contextualise the words found in each document allows the development of a strategy to increase significantly effectiveness of a search. This is possible due to the increase in documents moving from unstructured plain text through semi-structured documents to fully structured documents with underlying semantic meaning tied to ontologies and dictionaries of meaning and schematic control.