The subject matter described herein generally relates to systems and methods for audio content navigation.
Individuals are able to read a large amount of text information in a short time by skimming the textual content for interesting and/or relevant content. The textual content, such as displayed as part of a web page, is presented to the user. The human mind is able to skim through the textual content to identify key words and phrases from the sentence. For example, the text in large/bold fonts in the following line below is what may be used to identify whether the sentence is of importance to the reader:                “When I was walking in the garden yesterday, I saw a snake that passed very close to me.”Even without any such textual formatting, the human mind is able to catch the keywords and then identify whether the content can be skimmed through or should be read in detail.        
Content creation and access in the developing world is mostly focused on audio content. There are various reasons for this, such as to account for low literacy rates among certain groups of users, to accommodate use of simple/standard devices (for example, voice-only phones), and the like. One clear example of this is the development of the World Wide Telecom Web (WWTW) (or alternately, the Spoken Web). The WWTW is a web of VoiceSites that contain information in audio, and can be accessed by a regular/standard phone.