1 . Field of the Invention
The present invention generally relates to data processing and more particularly to searching text, or other type data fields, using automatically expanded search terms.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system is a computer database management system (DBMS) that uses relational techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses.
Regardless of the particular architecture, in a DBMS, a requesting entity (e.g., an application or the operating system) demands access to a specified database by issuing a database access request. Such requests may include, for instance, simple catalog lookup requests or transactions and combinations of transactions that operate to read, change and add specified records in the database. These requests are made using high-level query languages such as the Structured Query Language (SQL). Illustratively, SQL is used to make interactive queries for getting information from and updating a database such as International Business Machines' (IBM) DB2, Microsoft's SQL Server, and database products from Oracle, Sybase, and Computer Associates. The term “query” denominates a set of commands for retrieving data from a stored database. Queries take the form of a command language that lets programmers and programs select, insert, update, find out the location of data, and so forth.
One example of a query used extensively is a command for searching data, or a request to return data stored in the database, given a specific search term. One of the problems faced by data searching applications is that of finding a concept with a search criteria. The common query condition for searching a field containing a specific term is based on a specific value (e.g., diagnosis=“colon cancer”), rather than a concept. Value searches work well in cases where the data is relational in nature, for example, where a discrete set of known values are stored in discrete rows and columns within a tabular format. This type of searching breaks down, however, when you are interested in finding information stored in a free text or open format, such as a textual document or text field of a database.
For example, a doctor's notes may have several paragraphs of written information for each time a patient visits the doctor. Searching this data with the above condition will obviously not work, as the notes will likely contain additional text, rather than a specific value. In some cases, wildcards may be used to specify that the field should be returned if a specified value is found anywhere within the field. However, this approach may still present problems. For example, if the specified value spans the end of a line, in many data formats, the value to not be found.
Another, possibly more significant problem is that, in free text information, the preparers of the information are free to describe things as they see fit, using their preferred terms. As an example, if a doctor has chosen to write in his notes “cancer of the colon”, “colon malignancy”, or any other descriptive phrases, the field would not be found by searching for “colon cancer”. This is unfortunate, as a user searching for information regarding colon cancer would likely be interested in retrieving and reviewing these notes.
Accordingly, there is a need for an improved and more flexible method for searching fields, such as text fields, preferably that allows conditions based on a single specified search term to be expanded to include a set of conceptually-related expanding search terms.