Computing platforms that ingest and process healthcare data face a number of challenges. For example, there has been a dramatic increase in the number of computer application solutions that utilize healthcare data to generate outcome data that is relevant to clinicians and patients. Locating the processing nodes that execute these solutions close to where the healthcare data is ingested and stored may be unfeasible as the healthcare data sets expand into the petabyte range. Co-locating the processing nodes with the underlying healthcare data may also be unfeasible due to physical size constraints of the data centers that host the nodes and/or rack availability at these data centers. As a result, processing nodes that subscribe to certain sets of healthcare data may not always be located at the data center where the healthcare data is received and stored.
This scenario may create a number of different problems. For example, a computing solution that utilizes a defined set of healthcare data from a healthcare data source may be located at a first data center, and another solution that requires the same set of healthcare data may be located at a second geographically-disparate data center. In this case, a crawler would need to pull the set of healthcare data from the healthcare data source twice, with one upload occurring at the first data center and a second upload occurring at the second data center. This process consumes valuable processing resources and Internet bandwidth at the healthcare data source. It is also duplicative and increases data center hosting costs. In another example, a new computing solution may be deployed at a data center, but the healthcare data needed by this new solution may be located at a different data center. In a typical case, the healthcare data would have to be re-extracted from the data source which once again consumes computing resources at the data source and increases data center hosting costs.
Another challenge faced by healthcare operating platforms is the loss of healthcare data due to, for example, a natural or man-made disaster occurring at the data center hosting the data. Because modern-day medicine relies heavily on the use of computer applications to aid decision making, loss of data hosted at a data center can significantly impair the healthcare delivery process. This problem becomes even more critical when the data that is lost is no longer available from the data's source.