The present invention relates to the field of digital computer systems, and more specifically, to a method to extract information from text data.
The number of scientific publications is growing exponentially and search engines such as PubMed make available huge amounts of information in the form of unstructured written language. As of January 2017, PubMed comprises more than 26 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central (PMC) and publisher web sites. The numbers remain high even when focusing on specific fields of biomedical research, such as prostate cancer. For instance, a simple query for prostate cancer related papers on PMC can return a list of over 180000 publications. In order to fully exploit this rich corpus of written knowledge, the development of new techniques that can automatically extract facts and knowledge, manipulate the data and produce summarized representations that captures specific information are required.