Field of the Disclosure
Aspects of the present disclosure relate to information processing to achieve formalization and structuring, including audio analysis, manipulation, and representation, and, more particularly, to systems and methods of structured analysis and relationship determination between information value and information quantity as related to harmonically-configured data, including digital media.
Description of Related Art
A generally-recognized standard for the concept of unified audio or other digital data formalization may not be generally available in the art, though various techniques may be implemented to accomplish the same. For example, some techniques employ simplified audio representations of a sound signal, which primarily include speech recognition and speech synthesis as well as compression of digital data representing music. In one aspect, speech technologies have progressed from representing sound signals through the corresponding waveform, though such techniques often function on the basis of words or even entire phrases in the speech data. Such a basis in words/phrases represents a form of information which is closer to the natural perception of the human brain. In contrast, formalization technologies implemented for music representation generally use only a physically perceptive representation of information, i.e., in the form close to that of the physical perception of sound by the human ear.
To date, other attempts at a more abstract universal representation of music remain largely unsuccessful. Such lack of success may be attributed, for example, to the fact that speech information includes a form of primary language and syntax of its description, engineered with precise mathematics which have been established and fine tuned by many generations. In contrast, the existing representations of music, for example, based on notes or sound samples, are relatively primitive as compared to speech information analysis and any such representations are generally not universally applicable.
In this regard, one of the relatively more informative digital representations of sound currently available may be the PCM format, generally referred to as uncompressed audio. However, even though such a format may be relatively more informative, such informativeness is offset by a relatively large data file size. The large data file size, in turn, may render such a format or representation unsuitable or impracticable, for example, for fast delivery/transmission and/or compact storage. If such attributes are desired, more compact, though likely less informative, representations have been or are being developed that generally employ a popular approach to data reduction, such as used, for example, in MP3, OGG, WMA and other classic psychoacoustic models or representations. However, natural sounds include more redundancy than such typical audio signal representations/models are capable of effectively analyzing. Further, human perception of music is generally far more complex than any existing psychoacoustic model. As such, there exists a need for an improved approach to sound formalization that is capable of representing sound, audio, music, and/or any other harmonics-related digital data in a more compact (i.e., less data intensive), yet more informative manner (i.e., in terms of the completeness of the representation that may be provided).
In efforts to satisfy this demand, more progressive representations/models have been developed which are currently being employed, for example, in MP3-Pro, HE AAC, MP3 PlusV, MPEG-4 SSC, MPEG 4 structured audio, and MIDI. MP3-Pro and HE AAC essentially use peculiarities of human perception as the basis for extracting structure elements in an audio signal, without preserving specific phase and similarity search in the signal. Low frequencies are replicated onto high frequencies, without preserving the phase, but retaining the similarity principle and general sound parameters, such as conservation of energy and the chaotic nature of the signal. MP3-PlusV extracts, stores and generates harmonics, without preserving the phase, and may also be applied for determining the high frequency part of the signal. MPEG-4 SSC (Sinusoidal Coding) is a method of representing the signal as a set of organized objects, such as harmonics, hits and noise. However, such a method of extracting those objects from the signal is dissimilar to the perception scheme naturally occurring with the human brain. Therefore, reproduction of the signal from this representation/model may include undesirable artifacts. MPEG-4 Structured Audio attempts to represent sound by a unified algorithm that is capable of generating a variety of sound structures. While this approach may have some potential, the creation of such an algorithm may be problematic due to the required computational resources. MIDI format usually requires a relatively smaller data file size, but, similarly to MPEG-4 Structured Audio, is a representation/model that is generally suitable for manual writing of music, and not for representation of naturally occurring sounds and/or already-created audio compositions.
Thus, while becoming more advanced in efforts to reduce or eliminate inherent perceptive redundancy in an audio signal, as compared to classic psychoacoustic methods or models, existing structured and object-oriented sound representations/models may tend to lose the scope of informativeness of the initial signal at a low bit rate (i.e., high information quantity), and thereby fail to preserve the quality of the original audio signal with an acceptable degree of tolerance, as the sound representation/model produces a more compact data file size. Accordingly, there exists a need for a formalization scheme and arrangement for digital media, such as audio, that is capable of reducing the information quantity or bit rate of a digital information file by appropriate structuring, while retaining an information value within a threshold of or even greater than the original digital information file.