Steganography is the art of hiding information. The term literally means “covered writing” and its history and origins pay homage to this definition. Acrostics, invisible inks, semagrams, and microdots (extremely tiny photographs used by the Germans in WW-II) are just a few examples of steganography. Nowadays, steganography is better known for its use in the digital realm. An abundance of computer file formats (used to store such items as text, pictures, sounds, movies, etc.) provides a rich selection of cover objects in which to hide information.
Cryptography and steganography are often compared and contrasted. While they both aim to provide some level of secrecy, “cryptography is about protecting the content of messages, steganography is about concealing their very existence” [2]. One accustomed to cryptography terminology usually does not need an exhaustive explanation on steganography terms because many of the concepts and maxims of steganography are borrowed and transferred directly from cryptography. There are stegosystems that implement secret-key steganography or public-key steganography; instead of crypto-keys there are stego-keys; and even Kerckhoffs' principles apply to stegosystems [2]. The basis of steganography, however, does not lie in mathematics and number theory; rather, it lies in the techniques of unnoticeably altering a cover object.
In implementing these techniques, certain tradeoffs must be made. The most common goals in hiding information are the opposing concepts of detectability, bit rate, and robustness. The goal of detectability is to increase stealth so that it is very difficult to determine if hidden information exists in the cover object. Bit rate, also referred to as encoding rate, may be calculated as (size of embedded data)/(size of cover object)*100%. Its aim is to maximize the amount of information that can be embedded into the cover object. The goal of robustness lies in increasing the ability to recover encoded information even if an interloper has manipulated the cover object (this is the focus of watermarking). Since these goals always oppose each other, they are often represented as a triangle of tradeoffs as shown in FIG. 1 [3].
Practically speaking, there are various aspects to consider when implementing a steganography technique with files. First, the modifications to the cover object file must not be so severe that it no longer functions or serves its purpose—i.e., it must always conform to the cover object's file format standard. Another crucial aspect of any steganography algorithm is that a typical user must not notice that the file has changed. In perception-based multimedia files, this means that the overall “sound” and/or “look” of a music, picture, or movie file must not appear to be any different. Lastly, it is a desired property that the size of the cover object file either does not or very minimally changes in size. All of these factors contribute to the basic necessity of a steganographic algorithm to hide information so as not to arouse suspicion.
The Standard MIDI File Format
The Musical Instrument Digital Interface (MIDI) standard was developed in 1983 to standardize the hardware and communication protocols for controlling digital instruments and synthesizers in a booming electronic music industry. As the MIDI format became more popular, a method of saving raw MIDI data was needed—and so the Standard MIDI File (SMF) format came to fruition.
SMFs are not akin to wav files—MIDI files are more like a musical score that indicates what instrument should play which note at what time and for how long. On the other hand, WAV files are literally waveform data—many discrete samples of the sound waveform.
The details of the SMF specification can be summarized as follows. All SMFs are composed of a header followed by one or more tracks. The header, among other things, defines the type of MIDI file and the number of tracks in the file. Each track contains a series of sequential events. These events may specify music playback information (MIDI events), meta-information (meta-events), or system exclusive messages (sysex events). MIDI events include codes that define when a note is turned “on” and “off,” when to perform a pitch bend, what instrument to play, and other music data. Meta-events contain additional information about the music file, including lyrics, copyright notices, track information, key signature, tempo, time information, and more. MIDI hardware devices use sysex events to send information and perform special functions. All event codes specify a corresponding delta-time value, literally the amount of time to wait after the previous event. Therefore, in a group of events that occur at the same time, the first event will have a non-zero delta-time and the rest of the events will have delta-times equal to zero. Most events in a MIDI file are note on and note off events that indicate which instrument the event is for (channel number), the note's pitch (note number) and volume of the note (velocity value). When a note on event occurs, the duration of the note is controlled by the sum of delta-time values between the note on and the corresponding note off event. For additional information on the MIDI specification, consult The Complete MIDI 1.0 Detailed Specification [4].
Analysis of MIDI Files for Hiding Information
Given some of the characteristics of MIDI files, one may envision some of the potential methods for embedding information within them. Some ideas for MIDI steganography and their tradeoffs follow.
One possible method of embedding information would be to insert additional events that do not affect the sound of the MIDI when it is played. It would be easy to add events that “do nothing” like many note on MIDI events with a velocity/volume value of zero. Meta-events that store text information may also be added to encode information, possibly by just adding raw data. Also, undefined meta-events may be added since the default action (as defined by the Standard MIDI Files specification) is to ignore such meta-event messages and continue parsing the file. This method is very problematic because the original file size would increase significantly (many such events would have to be added to encode a large amount of information). It would be easily detectable since many events that are undefined, events that have no purpose, and text events that do not store information about the music file are very suspicious. Some MIDI writing software even automatically remove superfluous events in a MIDI file by design.
As described above, most meta-events simply store text information. It would be possible to modify existing text fields to encode information, possibly using well-known text steganography techniques for these events. These text fields could be replaced with the data to be embedded, or extra text could be added to them (like white space in the text steganography application “Snow” [5]). Unfortunately, these text fields are typically less than 30 bytes of text and are not widespread in average MIDI files. Using these methods would provide a very low encoding rate. Similarly, these methods are easily detectable, and may increase file size depending on the approach.
Another possibility for embedding information lies in manipulating the least significant bit (LSB) of certain MIDI event data. LSB encoding methods are very common among steganographic algorithms because changing the LSB usually does not affect the user's perception of the object. The best way of carrying out this method in a MIDI file would be to manipulate the velocity (volume) values of a note on MIDI event. These vary from 0 to 127. If the velocity value is decremented or incremented by 1, the slight change in volume will likely be undetectable to the average listener's ears. Analysis of the file, however, would show a variety of values for note on events. This is not a desired behavior since most normal MIDI files, especially those created by music composition software, have standard discrete values that correspond to the musical dynamic notations (from lowest volume to highest volume) ppp, pp, p, mp, mf, f, ff, and fff: Corresponding values vary, but two common practices are to use a logarithmically distributed scale {1, 3, 10, 32, 45, 64, 90, and 127} or an evenly distributed scale {1, 16, 32, 48, 64 (as a “middle” volume), 80, 96, 112, and 127}. Not all MIDI files are created using specialized software, as some are literally recordings of a human playing a MIDI compatible instrument (usually a keyboard) that sends out MIDI events as the musician plays. MIDI files created in such a manner may have significant variations in velocity values, but in practice, few MIDI files are created and publicly released in this fashion. In conclusion, for the case of LSB encoding, the size of the MIDI file would not change, the capacity of embedded data would be good (better than the previously discussed methods), but it would not be stealthy since any velocity value outside of the standard discrete values would be vulnerable to simple analysis.
The most promising method of hiding information in a MIDI file is that of changing the order of simultaneous events. The MIDI specification does not provide guidance as far as which events (that occur at the same time during playback) should be placed before or after other events in the MIDI file. It is assumed that all software programs (and hardware devices) that parse MIDI files will be able to handle simultaneous events in any particular order. Because of this fact, these simultaneous events may be considered a list that can be rearranged without any effect on the playback or function of the MIDI file. As a result, the existing events in the file will not change, nor will their order; so the file size should not increase. Although there is no requirement that certain types of events must appear before others in the file, reordered lists may appear strange because most commercially available music composition software has some sort of method in which events that occur at the same time are organized (meta-events usually occur before MIDI events, note off events usually occur before note on events, etc.). Lastly, this method of reordering events also has the potential for a high encoding rate, depending on the properties of the MIDI file itself