US 12,169,526 B2
Generating and presenting a text-based graph object
Paul Pangilinan Del Villar, Bothell, WA (US); Xiaofei Zeng, Redmond, WA (US); and Mingyang Xu, Redmond, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Sep. 24, 2021, as Appl. No. 17/484,670.
Prior Publication US 2023/0112763 A1, Apr. 13, 2023
Int. Cl. G06F 16/906 (2019.01); G06F 16/901 (2019.01)
CPC G06F 16/906 (2019.01) [G06F 16/9024 (2019.01)] 19 Claims
OG exemplary drawing
 
1. A method, comprising:
extracting a plurality of key concepts from a collection of digital content items, wherein extracting the plurality of key concepts includes:
applying a first model to text content of the collection of digital content items to identify a first set of terms from the text content, the first model comprising a rule-based model including rules for identifying certain types of terms within the text content of the collection of digital content items; and
applying a second model to the text content to identify a set of candidate terms from the first set of terms, the second model comprising a machine learning model trained to identify one or more key topics within a given text based on the given text and one or more terms within the given text indicated as one or more certain types of terms;
receiving the set of candidate terms associated with a domain of interest by way of a graph query application over a network;
applying a zero-shot classification model to the plurality of key concepts and the set of candidate terms to determine, for each key concept from the plurality of key concepts, a candidate term from the set of candidate terms associated with a respective key concept; and
generating a correlation graph object for the collection of digital content items, the correlation graph object including:
a plurality of nodes associated with respective key concepts from the plurality of key concepts, each node including an indication of a candidate term from the set of candidate terms associated with a corresponding key concept; and
a plurality of edges connecting the plurality of nodes, the plurality of edges being associated with pairs of key concepts corresponding to nodes connected by the respective edges, each edge of the plurality of edges including a correlation value based on a plurality of pre-calculated segment correlation values for associated segments of time, the plurality of pre-calculated segment correlation values based on frequency of co-occurrence of a respective pair of key concepts within subsets of the collection of digital content items associated with the respective segments of time.