CPC G06F 40/20 (2020.01) [G06F 16/3347 (2019.01); G06F 18/24147 (2023.01); G06F 40/40 (2020.01); G06F 40/47 (2020.01); G06F 40/58 (2020.01); G06N 20/00 (2019.01); G06F 40/30 (2020.01)] | 20 Claims |
1. A system comprising:
one or more processors; and
one or more machine-readable storage media having instructions stored thereon that, in response to being executed by the one or more processors, cause the system to perform operations comprising:
receiving a user query that includes unlabeled content;
embedding, using a machine learning model, the unlabeled content, wherein the embedding generates an unlabeled vector;
determining, from a plurality of labeled vectors stored in a vector index, a first set of labeled vectors that match the unlabeled vector, wherein the first set of labeled vectors are generated from a set of labeled content stored in a database;
calculating, based on a confidence score for at least one labeled vector in the first set of labeled vectors, a propagation score for the unlabeled vector, wherein the propagation score indicates whether a new label should be assigned to the unlabeled content;
assigning, based on the propagation score, the new label to the unlabeled content, wherein the new label is selected from the first set of labeled vectors;
storing the newly labeled content in the database;
retrieving based on the first set of labeled vectors and mappings between the vector index and the database storing labeled content, a set of labeled content from the database; and
responding, using the set of labeled content, to the user query.
|