Traditional databases allow users to find pieces of information that are relevant to an entity, and while millions or billions of records may describe that entity the records are generally not linked together without formal foreign key relationships. Large collections of records exist with informal, unreliable or non-existent foreign key relationships. This makes it extremely challenging to bring together all of the records relevant to a single entity.
The traditional approach is to pre-link all of the data within such a collection such that finding one record will lead you directly to the collection of records. This traditional approach has two distinct problems.
First, pre-linking a large collection of records is an intensive process taking considerable time. This imposes a significant lag on the time it can take to integrate new records into the linked collection adversely effecting the timeliness of the data in that collection.
Second, pre-linking the data by definition is restricted to the model used to perform that pre-linking, drastically reducing the ability of a user of the system to vary the parameters of how strongly or weakly records are linked. Pre-linking is also limited to the data available at the time of the pre-linking step.
Another approach is to avoid any pre-linking of the data, but rather to link in real time, or “link-on-the-fly,” in response to a user query. This approach allows new records to immediately participate in the collection avoiding any issues of timeliness. It also allows a wide variety of models to be applied to perform the linking using varying algorithms and parameters in the linking process. The traditional disadvantage to this approach has been the ability to run that data intensive query and achieve acceptable interactive response times. This can be overcome be placing the collection in an in-memory database with embedded analytics.
There is therefore a need in the art for flexible database architecture capable of supporting multiple customized analytics modules, designed to process data in real time without having to change how the data is managed, prepared and stored within the system.