The creation of audio content continues to evolve for use in new applications. One such application is the World Wide Telecom Web (WWTW), also referred to as the ‘Telecom Web’ or the ‘Spoken Web.’ The Spoken Web is a network of VoiceSites hosted on the telecom network, wherein each voice site individually comprises a voice driven application. The Spoken Web system may be viewed as a telecom network parallel to the World Wide Web (WWW) that runs on the Internet infrastructure. VoiceSites are accessed by calling the number associated with the VoiceSite, called a VoiNumber. A VoiLink is used to link the various VoiceSites to one another. A VoiceSite may be created or updated through a voice driven interface, such that a user may create a VoiceSite or modify an existing VoiceSite using a cellular phone. The Spoken Web is an ideal solution for a large part of the world were the population does not have access to the devices necessary to access the Internet, but cellular phone penetration is high. As a result, the use of the Spoken Web and the number of VoiceSites continue to increase. Thus, the volume of audio content associated with the Spoken Web continues to steadily expand.
More particularly, the World Wide Telecom Web and interconnected voice applications (VoiceSites) and can be accessed by any voice-capable (e.g., landline or cellular) telephone. In the course of an ordinary phone call, the user interacts with a service or other application through speech or DTMF (dual tone multi frequency, or the signal to the phone company that is generated when one presses the touch keys of a telephone). Generally, VoiceSites contain an ample amount user generated content, this mainly being contained in the form of audio lists which have to be browsed linearly via telephone.
As such, in the context of VoiceSites, audio lists do tend to get longer over time, with no efficient mechanism available to organize the content, given its audio nature and length of the list. Categorization can normally be performed before the query is recorded so that the user can select a category, but problems are presented in connection with not knowing what the user will record. Sub-categories need to be created depending on the recorded content. Additionally, meta-information is generally unavailable; usually this amounts to little more than the author's phone number and a timestamp. Speech-to-text systems also tend not to be reliable in the VoiceSites context (or other related or analogous contexts), resulting in a paucity of information about content.