The disclosure generally relates to the field of data path analysis, and more particularly to determining and mapping lossy data transformations among nodes in a system model.
Compatibility in data formatting and encoding is a significant issue in many processing and networking environments. Consistent, compatible messaging protocols enable reliable communication and interactive processing among different systems and components. Protocol adapters and bridges may be utilized to map between systems and components that utilize different communication protocols. In addition to communication protocol consistency, interconnected systems and components rely on mutually compatible data encoding formats. Fundamentally, a data encoding format determines how non-binary information such as image data, audio data, and text characters are encoded into a binary digital format that may be processed and exchanged between processing units such as operating systems, application programs, audio or video processors, etc.
Aside from differences in the non-binary constructs to be mapped, data encoding techniques may differ in terms of application. For example, some data encoding techniques may encode raw audio data into audio files suitable to be stored and played (decoded) using a format-compatible audio player/decoder. Analogous data encoding techniques are available for storing and processing display image data. Compression is another category of data encoding that is closely associated with many forms of audio, image, and video encoding. For example, MP3 is a commonly utilized audio compression standard for the transferring, storing, and playback of music. Another distinct form of data encoding is text encoding which is becoming a more complex issue with the proliferation of distributed, globalized computer and network systems that require integration of multi-lingual text encoders/decoders.
Encoded data loss (lossiness) is a factor in many digital-to-digital as well as almost all analog-to-digital data transformations. Some level of lossiness is intended by the design of some data encoding methods. For example, the MP3 audio encoding format implements a type of lossy data compression that reduces the amount of data required to represent an audio recording while maintaining enough information to enable an adequate representation of the uncompressed audio. A manner of encoded data loss may also occur incidental to some text character transformations. For example, Unicode provides a universally applicable set of codepoints for encoding text characters and symbols used by most human languages to enable accurate exchange of text files that may include elements of multilingual text. While Unicode itself provide a universally comprehensive character/symbol to codepoint mapping, other text encoding methods such as ASCII may not cover the same character range. Therefore, an ASCII transcoder that receives input Unicode encoded (e.g., UTF-8) character strings, may lose some character information, particularly in multilingual application context. Problems relating to lossiness may arise in some systems, such as systems comprised of newly integrated components one or more of which transform data in some manner resulting in file or other data objects containing mixed data encodings.