1. Technical Field
The present invention relates to database query systems. In particular, it relates to database query systems which reduce sensitivity to keyword selection by automatically translating keywords when data is inserted into the database and automatically translating keywords when a query is made to the database.
2. Background Art
For many industries, there are a variety of uses to which a database query system can be put to use, including billing, sales, personnel records, medical and legal data, parts, inventory, etc. Early database systems typically required highly skilled personnel both to create original data entries and to construct queries to search the database. As the cost of computing decreased and database applications were extended to more industries and distributed to larger and more diverse groups of users, the training level and language background of the personnel using database systems became increasingly unpredictable. With the advent of the Internet and wide area networks, users in different parts of the country, or even the world, who are collaborating, use different terminology or language. All these factors present challenges to designing database systems that can be built and queried from all over the world.
While personnel who enter data would typically have some minimal knowledge of the database system, the end users who query the database systems may have very low levels of training or none at all. As a result, end users may have difficulty locating an item of data even if it is available. In addition, some systems now allow users from different regions and of different training levels, to update existing entries in a database or to add new entries. The use of database systems by untrained users can degrade the effectiveness of the system because entries and/or queries may be made using unusual, obscure or mis-descriptive keywords which result in missed matches during a query.
The problems caused by poor keyword selection are magnified by database systems which service users in widely spread locations and/or locations that use different languages or dialects. For example, a user who is accessing data in a database that was created in another language, or even the same language with regional differences, is far more likely to use a keyword that is familiar to his/her culture, but is not the keyword used by the originator of the information. While that user may merely get unsatisfactory results during a query, if that user is formulating an entry for the database, then his use of an inappropriate keyword creates a problem when other users are searching for it.
Even if a user is very skilled and highly trained, language varies from one part of the country to another, and from one country to another. For example, an American seeking to buy a home with two bathrooms in England would have to know that they were called water closets to be successful in a search. Otherwise, the query would result in no matches due to the language difference, even though the house had the desired feature.
As database applications are made available to wider and more diverse groups of users, the ability to enter and query the data will become increasingly more dependent on the system's ability to find data regardless of the keywords selected. Commercial databases now serve international and global markets. With the emergence of the Internet as an important commercial communications resource, it has become even more desirable to have a system that could translate keywords such that a search would successfully locate a data item even when colloquial keywords were used during the original data entry or during subsequent database queries. Further, it would be even more desirable to have a system which would generate an effective search argument even when both the original data entry and the subsequent database query use regional, ineffective or obscure keywords.
Prior art search systems have tried to increase the number of matches by allowing a user to enter a partial keyword. This type of search is called stem searching. For example, the stem "bath" would return all records with bath, baths, bathroom, etc. Unfortunately, the language difference which results in the selection of "bath" versus "water closet" would not result in a successful search using prior art stem based systems.
Another common prior art technique is to use a thesaurus to generate multiple queries for each keyword that the user enters. So a user desiring to locate an apartment in a real estate system may enter the keyword "apartment", but the system will search for apartment, apartments, condominium, condominiums, condo, flat, flats, quarters, duplex, studio etc. The result is multiple searches resulting in an inefficient database.
While earlier database applications may have had untrained users on the query side, they would usually have users with some training, even if minimal, on the data input side. Now however, systems are available in which data is subject to poor keyword selection from both directions: data input and data query. For example, a number of database systems have been developed to provide information related to the buying, selling or renting of particular types of property. For example, real estate listing services have cropped up on the Internet which allow anyone to write their own "ad".
This type of database query system (i.e., systems which are used to sell or trade property) are particularly susceptible to keyword related problems. The reason for this is that users of these systems who may be experts on the particular types of property they trade in, may be inexperienced with computers or even computer illiterate. Their lack of computer skill results in a poor choice of keywords because they do not understand the implications of a particular keyword selection. The problems related to keyword selection are magnified by this type of system, because the same users who enter poorly formed keywords are often permitted to enter data into the database for use by other users. The problem is compounded by users, who may or may not be experienced with computers, but who have no knowledge of the jargon used in the subject of the database. Of course, this problem is even further complicated when users that speak or use different dialects or languages attempt to enter or query the database.
A real estate listing system is most often used by individuals in a relatively confined geographic area. Using colloquial expression, they may be unconcerned with other users who may have different jargon or language. Therefore, the problems caused by the use of dialectical keywords in a system with a geographically confined audience are minimal. However, systems similar to a real estate listing system, such as those used on an international or global basis to trade property or commodities, are more prone to errors due to keyword selection. For example, the market for high quality or luxury items such as yachts, aircraft, exotic cars, businesses or luxury estate properties are marketed over wide, even global, markets. Wide geographic markets not only create problems due to multi-lingualism in the user population, but also create problems due to regional dialect or slang variations in a single language. It would be desirable to have a database system capable of use over wide areas or multilingual areas which could insulate users from keyword selection problems caused by language differences.
By way of example, airplanes have substantial values which tend to limit their sale in a local market. The most effective way to market commodities such as airplanes is by reaching an international market through a globally accessible database system in which information can be exchanged between buyers and sellers, and their agents. Those skilled in the art will recognize that while any size and price of airplane could be listed in a global database, the desirability of a larger audience or international database increases as the price of the item increases.
The following is an example of the problems language and keyword selection may cause. A listing database for yachts may result in a yacht owner in Hawaii listing a boat located in Baja, Mexico with a yacht broker in San Diego, Calif. A buyer in Switzerland may use a broker in Turkey to search for a desired yacht type. As can be seen, the opportunity for selecting dialectal, slang, local jargon, obscure or ineffective keywords in such a situation greatly increases the chances that the yacht will not be found when the buyer's broker searches the database. Therefore, it would be desirable to have a system which could insulate all of the parties from missed matches due to data entry and query keyword differences, as well as insulating the system from performance problems which result from inefficiently searching the database with an excessive number of keywords.
Prior computer systems directed to these markets have primarily used textual descriptions which are exposed to all of the keyword problems discussed above. In addition, prior systems have not provided a structured database which would more easily define data entries in terms of the unique and myriad structural and equipment combinations available to complex properties such as yachts or airplanes. It would be desirable to have a structured database in which the data entry and query processes could be dynamically altered to accommodate variances in equipment descriptions.
While addressing the basic desirability of using computerized database systems to manage information, the prior art has failed to provide a system which reduces keyword related errors by making both the data entry and the data query applications independent of the keywords used to search the database. In particular, the prior art has not provided a system which accepts any keyword entered and dynamically selects and substitutes keywords from a restricted list of keywords for data entry in the database such that a uniform searchable field is provided for query and then substitutes keywords from the restricted list of keywords for the data query such that the search locates the desired keyword even though the data entry and the data query portions of database access use different keywords. Furthermore, the prior art has not provided a dialect independent query system that allows complete sentences to be created and retrieved, in a grammatically correct form, utilizing a dynamic and configurable list of keywords.