Voice stream augmented note taking is a process for capturing information from an audio recording and associating that information with user-generated content. In some situations, it can be helpful for a user to be provided with additional information when the user is reviewing notes taken. For example, a user may be typing notes during a presentation such as a lecture or a meeting, but may not remember additional details associated with those notes during a later review. Attempts by the note taker to include all those details while listening to the presentation may result in the note taker missing later details as they try to keep up. Conventional systems, such as short hand, stenography, and rapid typing are often difficult to learn and may be impractical for casual conversations.