Information retrieval systems are known in the art. Such systems generally offer users a variety of means of expressing user intentions through queries. These include text search, parametric search, structured queries, selection from alternatives (i.e., browsing or navigation), and range specification. In general, the systems offer users a means of expressing queries using either a structured language (e.g., a language like SQL) or an informal input mechanism (e.g., English keyword search). When the input mechanism is informal, the problems of ambiguity may arise from the language itself. But, even when the input mechanism is formal, the user may not always succeed in expressing his or her intention in the formal query language.
Information retrieval systems may use a variety of techniques to determine what information seems most relevant to a user's query. For some queries, the choice of technique is not particularly important: for example, if the user enters a query that is the exact title of a document, most techniques will retrieve that document as the most relevant result. For other queries, the choice of technique can be very significant, as different techniques may differ considerably in the results they return. Unfortunately, it is not always clear how to select the best technique for a particular query.
Over the past decade, the biomedical community has witnessed a surge in the volume of clinically relevant information converted to electronic form. The American Recovery and Reinvestment Act of 2009, with its mandate to digitize health records, ensures that this flow of information will continue to accelerate. Biomedical assay databases, both publicly-available and proprietary, boasting many gigabytes of data, have become commonplace. To harness the potential for these resources to impact clinical decision-making and therapeutic design, computational tools are needed for simultaneously retrieving source documents (electronic health records, assay data, etc.) related to a given medical concept simultaneously from a variety of such sources. While many institutions have developed proprietary in-house solutions to storing this data, there do not appear to be any standardized systems available for indexing and intelligently searching these databases based on medical concepts.
The National Library of Medicine's Unified Medical Language System (UMLS) (Bodenreider 2004) provides a hierarchical organization of a wide-ranging set of medical and biological concepts, and includes thesauri for mapping concepts across languages and dialects.