US 12,169,516 B1
System and method for extracting citations from documents and constructing enriched citation databases
Xihao Xie, Plano, TX (US); and Yang Chen, Dallas, TX (US)
Filed by Litigiven LLC, Dallas, TX (US)
Filed on Sep. 26, 2023, as Appl. No. 18/475,162.
Int. Cl. G06F 16/38 (2019.01); G06F 16/335 (2019.01)
CPC G06F 16/382 (2019.01) [G06F 16/335 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A method for identifying citations from documents and constructing enriched citation databases, the method comprising:
obtaining, by a processing device, a document comprising texts of a natural language;
constructing pre-processing filters comprising a first set of regular expressions matching non-citation text patterns;
applying the pre-processing filters to the document to generate a pre-processed document by removing the non-citation text patterns from the document;
constructing citation filters comprising a second set of regular expressions, wherein each of the second set of regular expressions matches at least one of a corresponding citation or a context associated with the citation, and a regular expression in the first set of regular expressions or in the second set of the second set of regular expressions is one of an atomic regular expression or a compound regular expression defined by one or more atomic regular expressions or other compound regular expressions;
applying the citation filters to the pre-processed document to identify one or more citations and corresponding contexts that match at least one of the second set of regular expressions; and
storing the one or more citations and corresponding contexts in an enriched citation database.