CPC G06F 16/7844 (2019.01) [G06F 16/739 (2019.01); G06F 40/169 (2020.01); G06F 40/30 (2020.01)] | 24 Claims |
1. A computer-implemented method that when executed on data processing hardware causes the data processing hardware to perform operations comprising:
receiving a content feed comprising audio data, the audio data corresponding to speech utterances;
processing the content feed to generate a semantically-rich, structured document, the structured document comprising a transcription of the speech utterances, the transcription comprising a plurality of words each aligned with a corresponding audio segment of the audio data that indicates a time when the word was recognized in the audio data;
during playback of the content feed:
receiving a query from a user requesting information contained in the content feed; and
processing, by a large language model, the query and the structured document to generate, as output from the large language model, a natural language response to the query, the natural language response generated as output from the large language model conveying the requested information contained in the content feed; and
providing, for output from a user device associated with the user, the natural language response to the query.
|