1. Field of the Invention
This invention relates to the field of database searching, and more particularly to the field of language-based database searching using language-sensitive contextual searching.
2. Introduction
Information retrieval (IR) is the science of searching for information in documents, searching for documents themselves, searching for metadata which describe documents, or searching within databases, whether relational stand-alone databases or hypertextually-networked databases such as the World Wide Web. Automated IR systems are used to reduce information overload. Many universities and public libraries use IR systems to provide access to books, journals, and other documents. Web search engines such as Google, Yahoo search or Live Search (formerly MSN Search) are the most visible IR applications.
An information retrieval process begins by a user entering a query in to the system. Queries are formal statements of information needs, for example search strings in web search engines. In information retrieval a query does not uniquely identify a single object in the collection. Instead, several objects may match the query, perhaps with different degrees of relevancy.
An object is an entity which keeps or stores information in a database. User queries are matched to objects stored in the database. Depending on the application the data objects may be, for example, text documents, images or videos. Often the documents themselves are not kept or stored directly in the IR system, but are instead represented in the system by document surrogates.
Most IR systems compute a numeric score on how well each object in the database match the query, and rank the objects according to this value. The top ranking objects are then shown to the user. The process may then be iterated if the user wishes to refine the query.
As globalization increases, many sources are not only being produced in numerous languages, but some sources may even contain numerous languages within them. Determining which language or languages are used in a source can be a daunting task. As such, several methods of determining a source's language have been derived. Methods range from identifying specific short words in the sources, to comparing strings of letters with a reference, to identifying symbols only used in certain languages. Using one or several of these techniques, computers have a high success rate in determining the language or languages of a document.
While present search engines may be able to search for words in other languages, there is a need for an improved information retrieval system that is able to search for terms in one language and determine if a context or a document containing the result is in a second language.