Field
The present specification generally relates to methods for identifying and organizing issues discussed within corpus of documents and, more particularly, to methods for extracting and organizing such issues identified in the document corpus into a structured issue network of interconnected normalized issues.
Technical Background
Documents within a corpus are often linked together by citations. For example, legal documents and scientific articles often cite to previous works to support a particular rule, proposition or finding. In the legal corpus context, an author of a judicial opinion often cites previous cases in support of his or her own legal statement or rule. In turn, these cited cases have themselves also cited and/or been cited by other cases in support of the proposition-in-question (and so on). Therefore, selected documents within the corpus are intrinsically linked together around particular issues, and these links can be manifested in the form of citation networks.
Researchers often search the corpus for documents that discuss a particular issue or topic. They will use the citations to move forward and backward within the corpus to find additional relevant documents. However, documents, such as legal documents, may discuss many different topics or legal issues. Further, a document may cite a document for many different reasons. Two citations pointing to the same document may cite to the same document for different reasons. Currently, the researcher does not know the particular issue or topic that a citing document is citing a cited document for based on the citation alone. The researcher must therefore sift through the many different cited documents. Further, issues may also be linked together by citation. A researcher may not be aware that particular issues are related. Because of this lack of understanding of how particular issues are connected or otherwise related, the researcher may not perform a thorough and complete investigation into the original issue or research topic.
Accordingly, a need exists for alternative methods of extracting and organizing normalized issues within a corpus of documents into an issue network describing the interconnectedness of normalized issues within the corpus of documents.