In distributed computing using multiple autonomous computers, various programming models such as MapReduce are used to increase processing speed and/or reduce processing time for computational problems. Such frameworks can be used for processing parallel problems across a large dataset using large numbers of computers or nodes, and are primarily employed to increase processing power by making use of a large number of relatively inexpensive computers. A limiting characteristic of distributed computing frameworks such as MapReduce is that they typically assume consistent access to data resources.
According to some embodiments, a distributed computing framework is described which can be used to solve computational problems where there is potentially inconsistent access to the data resources, such as where the various data resources are controlled according to different policies governing their access and/or use.
According to some embodiments a method for distributed computing over distributed digital data resources having differing associated rules is described. The method includes distributing a computing task that uses a plurality of distributed digital data resources by dividing the computing task into a plurality of sub-tasks to be performed by a plurality of distributed worker nodes including a first worker node having access to a first digital data resource, and a second worker node having access to a second digital data resource. The first digital data resource is associated with a first set of rules that correspond to conditions for accessing (and/or computations that can operate on) the first digital data resource. Similarly, the second digital data resource is associated with a second set of rules that correspond to conditions for accessing (and/or computations that can operate on) the second digital data resource. The conditions for accessing the first and second digital data resources can be different from each other. According to some embodiments, the method also includes performing the plurality of sub-tasks using the plurality of worker nodes on the plurality of digital data resources, each of the worker nodes thereby generating a partial result; and collecting and combining the partial results thereby forming a final result for the computing task.
According to some embodiments the rules associated with the data resources are determined at least in part by one or more stakeholders in the data resources, and the stakeholder(s) can subsequently alter the rules governing their access and use. According to some embodiments, the distribution of computational tasks is performed by an entity that may have the ability to request that computations be performed at multiple worker nodes, but that does not have direct access to the data resources managed by those worker nodes. In some cases, the worker nodes only have access to one or some of the data resources, which may be located in geographically separate locations, such as different towns, regions, or countries, in which regional or national policies governing access to and use of the data may vary. According to some embodiments, the rules are selected from a domain of possible rules that is not determined by the distributing entity.
According to some embodiments, rules are associated with data resources, governing access to and/or other use of the data resources. In other embodiments, rules can also (or alternatively) be associated with computations that operate upon the data resource in order to provide a specific view of the data resource. The computations may also be associated with a particular user or group of users in order to limit the user's or group's access to information contained in the data resource by requiring that at least one computation be applied to the digital data resource before revealing the information to the user or group. According to some embodiments, the association between the rules and the computations is made by creating a digitally signed document comprising a pairing of a unique representation of the digital data resource and a unique representation of the computations to be associated with the data resource. For example, in some embodiments, techniques are used such as those described in commonly assigned U.S. patent application Ser. No. 12/773,501, Policy Determined Accuracy of Transmitted Information, published as U.S. Patent Publication No. 2011/0277036 (“the '501 application”), which is hereby incorporated by reference in its entirety. In other embodiments, other techniques are used.
According to some embodiments the digital data resources include medical information stored in medical facilities, and at least some of the rules correspond to access conditions that protect patient privacy. According to some embodiments, the rules are set in part by the patients. According to some embodiments, the medical information may include some or all of genomic data, proteomic data, microbiomic data, and/or any other type of *-omic, medical, and/or healthcare-related data.
According to some embodiments, a method for distributed computing over distributed digital medical data resources having differing associated rules is described. The method includes distributing an executable (or interpretable) computer program or specification designed to operate on genomic and/or other medical data to a plurality of distributed worker nodes, including at least a first worker node having at least partial access to a first set of genomic and/or other medical data, and a second worker node having at least partial access to a second set of genomic and/or other medical data, the first set of data being associated with a first set of rules that correspond to one or more conditions for accessing the first set of data and/or computations that can operate on the first set of data, the second set of data being associated with a second set of rules that correspond to one or more conditions for accessing the second set of data, wherein at least some of the one or more conditions for accessing the first and second sets of data from the first and second sets of rules differ from each other.