In many industries, a variety of terms are used as labels for products, parts, ingredients, procedures, milestones, and other labels commonly used in the industry or within a particular company. Often, such terms are applied inconsistently, either from subtle permutations, the use of a more specific or more general term, and errors. The use of a thesaurus of terms can be beneficial in the determination of equivalent and related terms. Such a thesaurus can be queried to find an equivalent term so that consistent term usage can be applied across the industry or company.
For example, clinical studies are often undertaken during preparation of a new consumer product. Such studies are used to determine adverse effects, effectiveness, marketability, duration, and other aspects of the new product. In the health and pharmaceutical industries, clinical studies are often mandated and scrutinized by Federal and state governmental regulations prior to the release of a new pharmaceutical or medical product. Typically, a large quantity of clinical data is generated by such studies. The clinical data is provided from a number of different sources involved with the study. The source of the data may be from human test subjects, physician reports, drug dispensary logs, laboratory test results, and other sources. The clinical data is then entered and analyzed, typically from a text format.
Often, inconsistent terminology is used to refer to the same clinical term. Different permutations, prefixes, suffixes, and generalities may be used by different sources of the clinical data. Further, the data entry process is prone to errors such as typographical and spelling errors. Therefore, the same clinical term may appear in a variety of formats throughout the clinical data produced by the study. The time required to classify the inconsistent terms and allow proper analysis can be significant, therefore delaying the study and consequentially, the release of the new product. The cost associated with such a delay can be substantial.
Terms associated with a particular domain often define a hierarchy of relations between the terms. Different terms may be related as equivalents, or as more general or more specific indications. For example, in a domain such as a manufacturing operation, each part may be comprised a number of other parts. These, in turn, also each include component parts. In this manner, a hierarchy of terms is established from the finished product as the most general to the raw components as the most specific.
A thesaurus of terms employed in an industry, enterprise, or company allows terms common to the industry, enterprise, or company to associate terms with other related or equivalent terms. Associations between the terms are organized in a hierarchy. Such associations indicate relations between the terms, including more general terms, more specific terms, and equivalent terms. By querying the hierarchy of terms, a particular term can be classified to another term. Such classification facilitates consistent term usage throughout the various contexts in which the term is employed. These contexts may include research reports, product literature, marketing literature, technical specifications, and corporate policies.
In one embodiment, a system and method provide for accessing and updating a thesaurus database of clinical terms employed in conjunction with a clinical study. The system and method classify the terms derived from raw clinical data to effect a consistent clinical term classification throughout the clinical study.
A study term can be extracted from raw clinical data and presented to determine a corresponding match in the thesaurus database of clinical terms.
A table of relations can be maintained to associate each clinical term with one or more related, or derived, clinical terms in the thesaurus. A clinical term can then be mapped to one or more derived terms as indicated by the relations. The derived terms can be further processed to select a preferred term from the derived terms. A set of rules can be defined to indicate the relations which are allowed, such as one-to-many and many-ro-many. The relations are verified against the set of rules prior to activation in the thesaurus database.
An omission manager can be operated to find a near matching candidate term if the clinical term is not found in the thesaurus database. A plurality of different clinical terms can therefore be classified as corresponding to a common clinical term as indicated by the relations.
Subsequent processing and analysis of clinical data can be facilitated by reference to the common clinical term rather than requiring consideration of all of the plurality of clinical terms. Further, a database state can be associated with the clinical terms and relations. The database state indicates a discrete point which can be recreated on demand to accommodate successive database changes such as revisions of terms and vendor dictionaries. In one embodiment, a timestamp value is associated with the clinical terms and the relations to accommodate periodic updates and changes to the terms and relations.
In this manner, a unified thesaurus database model can be used to represent an arbitrary hierarchy of related terms.