US 12,169,522 B2
Structured video documents
Johan Schalkwyk, Scarsdale, NY (US); and Francoise Beaufays, Mountain View, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Mar. 2, 2023, as Appl. No. 18/177,747.
Claims priority of provisional application 63/268,921, filed on Mar. 4, 2022.
Prior Publication US 2023/0281248 A1, Sep. 7, 2023
Int. Cl. G06F 16/783 (2019.01); G06F 16/738 (2019.01); G06F 40/169 (2020.01); G06F 40/30 (2020.01)
CPC G06F 16/7844 (2019.01) [G06F 16/739 (2019.01); G06F 40/169 (2020.01); G06F 40/30 (2020.01)] 24 Claims
OG exemplary drawing
 
1. A computer-implemented method that when executed on data processing hardware causes the data processing hardware to perform operations comprising:
receiving a content feed comprising audio data, the audio data corresponding to speech utterances;
processing the content feed to generate a semantically-rich, structured document, the structured document comprising a transcription of the speech utterances, the transcription comprising a plurality of words each aligned with a corresponding audio segment of the audio data that indicates a time when the word was recognized in the audio data;
during playback of the content feed:
receiving a query from a user requesting information contained in the content feed; and
processing, by a large language model, the query and the structured document to generate, as output from the large language model, a natural language response to the query, the natural language response generated as output from the large language model conveying the requested information contained in the content feed; and
providing, for output from a user device associated with the user, the natural language response to the query.