Detecting speech and music in audio signals (e.g., audio recordings and audio tracks in video recordings) is important for audio and video indexing and editing, as well as many other applications. For example, distinguishing speech signals from ambient noise is a critical function in speech coding systems (e.g., vocoders), speaker identification and verification systems, and hearing aid technologies. While there are existing approaches for distinguishing speech or music from silence or other environmental sound, the performance of these approaches drops dramatically when speech signals or music signals are mixed with noise, or when speech signals and music signals are mixed together. Thus, what are needed are systems and methods that are capable of noise-resistant detection of speech and music in audio signals.