An acceptable feature representation for any pattern recognition task (e.g., speech recognition) is one that preserves detail in the input signal, while remaining stable and invariant to non-informative distortions. While conventional speech features, such as log-mel, perceptual linear predictive (PLP), and relative spectral (RASTA), are all designed to be deformation stable, they remove important higher-order information from the speech signal. While better estimation techniques have been designed to preserve higher resolution detail, even these high resolution representations are processed using short term smoothing operators for deformation stability. As such, designing an appropriate feature representation is known to be challenging.