The present disclosure generally relates to authoring systems and methods, and more particularly, to a system and method for automatically extracting Bayesian networks from text.
Bayesian belief networks (BBNs) are graphical models that encode probabilistic dependence among random variables in a computationally tractable way, where the random variables are represented by nodes in a directed acyclic graph, and dependence relations by arcs. In addition to the graph, a BBN includes a quantitative layer that encodes for each variable the probability distribution of that variable conditional on its parents in the graphs. When all variables are discrete random variables, those probability distributions are encoded as conditional probability tables (CPT).
BBNs have the advantage that expert knowledge can be encoded and represented as a set of probabilities, rather than as fixed rules, as in an expert system. They can compute the probabilities of both single variables and combinations of variables, and allow automatic updating of probability estimates in the presence of new data.
BBNs are widely applied within a variety of contexts, including engineering, computer science, medicine and bioinformatics. For example, they have been used in clinical decision support systems for years, and systems applying BBNs to diagnostics include:
the Iliad system (http://www.openclinical.org/aisp_iliad.html), which now covers 1500 diagnoses based on thousands of findings,
the Dxplain system (http://lcs.mgh.harvard.edu/projects/dxplain.html), which has a database of crude probabilities for 4,900 clinical manifestations that are associated with over 2,200 unique diseases, and
SimulConsult (http://www.simulconsult.com/), which is widely used by pediatric neurologists in the US, and by the end of 2010 covered some 2,600 diseases in neurology and genetics, taking its data from peer-reviewed medical literature.
BBNs can be constructed using structured data, or when this is unavailable, through literature review or expert elicitation. Typically this task requires a considerable amount of human effort, which may be impractical on a large scale. As a consequence, there is a clear need for a system which allows an automated extraction of the relevant probabilistic information, including both dependence and independence information and quantitative probabilistic information, as well as offering authoring capabilities, to facilitate the building of BBNs.