The present invention relates generally to verification of complex workflows and, more particularly, to evaluating the quality of a complex workflow, for example those arising in research and development, through the subdivision of the complex workflow into verifiable modules, whose verification is done by internal assessment or by leveraging a community-based assessment.
A complex workflow consists of a number of inputs in the form of (but not limited to) data, signals or material, for example, and a set of processing steps which yield a number of desired outputs in the form of signals, materials or data. These outputs depend both on the inputs and on the processing steps in complex and nontrivial ways. Hence, the goodness or appropriateness of the overall workflow design cannot be assessed using simple or trivial metrics based on the final output. This is so because if the final output is not what is desired, it would be very difficult to understand which step or steps in the complex workflow are at the root of the failure to produce the expected outputs.
Industrial research processes can be described by complex workflows that lead from simple hypotheses to a final product. Workflows are composed of interdependent atomic modules that perform specific research tasks based on the results of other modules.
Stolovitzky et al. proposed a methodology for the verification of such research pipelines that consists of a series of challenges posed at each of the comprising modules. As part of this methodology a trusted third party uses a list of known input-output values to validate the methods used at each research module by comparing the module output to the gold standard. See P. Meyer, J. Hoeng, J. J. Rice, R. Norel, J. Sprengel, K. Stolle, T. Bonk, S. Corthesy, A. Royyuru, M. C. Peitsch, and G. Stolovitzky, “Industrial methodology for process verification in research (IMPROVER): toward systems biology verification,” Bioinformatics, vol. 28, no. 9, pp. 1193-1201, May 2012.
Research tasks can often be cast as binary classifiers in such cases like gene network construction, drug sensitivity signatures or therapeutic target discovery. The challenge in this case is to correctly predict the class label of a set of test samples known as the “gold standard.” The present invention describes a way to perform this verification task even when a gold standard is not available.