Social networking environments are currently visually based systems. For example, users can view posts that include text, images, and videos. A hallmark of social networking environments is the ability of users to post content as well. The posted content is also text, images, and video. Further, for social networking environments, the interface is visual, typically a graphical user interface that connects the user to the social networking environment over the Internet. However, many social network users might be interested in posting content, but are not comfortable generating the content for posting. For example, in some cases the user may not be comfortable typing content longer than a few sentences or may not have a keyboard readily available for typing a post. Further, text postings often do not easily convey emotion or feeling that a user may want to convey in a post. In other cases, the user may have a video camera available, but the user may not be comfortable appearing on video or in images because the user may be self-conscious. Embodiments described herein describe an audio based social networking environment to overcome these deficiencies.