The present invention relates to transcribing speech content in a speech communication process, and in particular, relates to generating a text content summary from speech content.
In a real-time voice communication process, for example, a telephone communication (or teleconference), it may be desirable to record the content of the voice communication. Additionally, a user may also want to convert the recorded speech content into readable text, for example, as a memo.
Known solutions can convert speech content into text, as well as customize a summary of the real-time speech content, as required by a user. Known content summary generating systems may generate content summary from speech content in response to an indication issued by a user. Specifically, while listening to the speech content, the user can press a preset indication button on a speech content playing device (for example a telephone) each time he/she feels interested in a segment as currently played. The system can then use a segment of speech content whose play time is close to the time the user presses the indication button to generate the summary. The user may press the indication button multiple times at different time points. Accordingly, there may be a plurality of segments of speech content used by the system to generate a text content summary.
Although the plurality of speech segments are determined based on pressing an indication button, their importance to the user might be different. The user cannot indicate relative importance among the plurality of segments of speech content merely by pressing the indication button. Thus, when selecting the speech content for generation of a content summary, the system can only treat all the segments of content as being of the same significance. Therefore, the text content summary generated as such might be unsatisfactory to the user.