Over the years, many healthcare technology companies have built clinical data repositories using mainstream relational database technologies, such as those offered by Oracle, Sybase or DB2. These technologies yield data repositories that typically include many thousands of tables representing data relationships and some business logic. Due to the “relational” design of these technologies, pieces of data are forced into highly normalized, relational data models. The enforced relational structure is often optimized for a specific transactional model or logic, and does not lend itself to novel or flexible approaches to data analytics. For example, the relational data structures are not suitable for large-scale storage, or analysis, of unstructured data, such as unstructured textual reports or image files. Moreover, restructuring these data schemas for alternative uses can be time consuming, or simply not feasible, for large, mature relational repositories with trillions of data points. Limitations such as these have become increasingly important as more and more devices in healthcare settings are configured to produce data.
As one example, Structured Query Language (SQL) query builders allow users to join tables and build SQL expressions with complex conditions and logic. Setting up such queries/expressions can be very time consuming and difficult, particularly when the data being analyzed is spread across many different relational database technologies. Moreover, even if a user is trained well enough to construct a syntactically correct SQL expression, the odds that the query will yield precisely the expected/desired cohort of subjects may be low. The complexity of such queries can make it very difficult, for example, to identify a patient cohort by assessing search criteria on a patient-by-patient basis (“population-based” searching) rather than an encounter-by-encounter basis (“encounter-based” searching). Thus, conventional analytics tools using relational database technologies are generally inaccessible to the more casual/untrained user.
Conventional clinical analytics techniques and tools can also require that a user invest a great deal of time and effort up front in order to identify which disease codes (e.g., ICD9 codes) correspond to the desired cohort definition. For example, the user may need to spend hours or days with staff members or other individuals who have special expertise/experience in order to identify the appropriate codes. Once again, this can result in less ease of use, and/or require additional user training.