The present invention relates generally to a steganalytic method and apparatus for detecting and decoding semantically encoded natural language.
Generally, cryptography and stenography relate to the art of information hiding while cryptanalysis and steganalysis relate to the art of discovering how the information was hidden.
Cryptography, which includes encryption and decryption, involves scrambling a message using a key so that only those who have the key may descramble it. More specifically, encryption systems perform an encryption operation on a plaintext message using an encryption key to produce a ciphertext message that is not understandable but that is still visible. The receiver of a ciphertext message performs a corresponding decryption operation with a decryption system using a decryption key to recover the plaintext message. Cryptanalysis involves identifying the key. Additional background on cryptography is disclosed by Kahn, in “The Codebreakers: the Story of Secret Writing,” second edition, Scribner, New York, 1996 (original ed.: MacMillan, 1967), which is incorporated herein by reference and referred to below as “The Codebreakers”).
In contrast, steganography concerns techniques for hiding messages in such a way that their presence cannot be seen in the plaintext message. Thus, unlike cryptography which produces a visible message that cannot be understood without the appropriate key, steganography produces a visible message that can be understood but contains a hidden meaning within the visible message. Steganalysis is the art of detecting the steganographic meaning of the hidden message. Additional background on steganography and steganalysis is disclosed by Neil F. Johnson, in “Steganalysis”, Chapter 4 of Stefan Katzenbeisser (ed.), Fabien A. P. Petitcolas (ed.), “Information Hiding Techniques for Steganography and Digital Watermarking”, Artech House Books, 2000, which is incorporated herein by reference.
As defined herein, “semantic camouflage” refers to a subset of steganographic techniques, which subset includes those techniques used to hide a covert message in an overt message by performing a natural language transformation. The overt message is one that can be read on the face of the plaintext message. However, with additional information that describes the natural language transformation, the covert message may be uncovered. For example, a group of people may set up a convention for carrying out conversations between themselves where any reference to some illicit substance is to be made using a reference to “flowers”. Their conversations could thereafter involve a discussion concerning the quality, age, variety, and quantity of the illicit substance without ever having to identify it as such in the conversation.
Others have described semantic camouflage. For example, Kahn in The Codebreakers describes “jargon code” that was used by spies during the 20th century World Wars and the efforts of counter-intelligence services to intercept such coded messages through censorship activities. More specifically, Kahn in The Codebreakers at page 519 discloses that: “Censorship defends itself against this ruse by a feel for stilted or heavy-handed language and by a healthy skepticism concerning subject matter.” Such interception of semantically camouflaged messages is based on a (manual) human assessment of whether the message is believable.
Yet others have described semantic camouflage as the field of “forensic linguistics”, which is the study of linguistic evidence that can be brought to bear in legal contexts. One special case is the study of evidence indicating that the participants in some particular conversation might be making use of coded language with the intent to deceive. For example, Shuy in “Discourse Clues to Coded Language in an Impeachment Hearing” (published in G. Guy, C. Feagin, D. Schiffrin and J. Baugh, Eds., “Towards a Social Science of Language: Papers in Honor of William Labov”, Amsterdam: John Benjamins, 1997, pp. 121-138), describes a report by an expert in forensic linguistics on a study conducted as part of a legal trial in order to assess whether a taped conversation showed any evidence of “partial and disguised codes”.
Semantic camouflage is distinct from other steganographic techniques which involve concealing a ciphertext message as innocuous text. An example of such a system is disclosed by M. Chapman and G. Davida, in “Hiding the Hidden: A Software System for Concealing Ciphertext as Innocuous Text,” International Conference on Information and Communications Security, Nov. 11-13, 1997, Beijing, P. R. China, in which an encrypted message is made to look like ordinary language using a number-to-word dictionary that turns visibly encrypted messages into something that looks like ordinary language, so as to escape notice from potential code-breakers.
Also, semantic camouflage is distinct from null ciphers which involve spreading parts of covert messages in discontinuous portions of overt or plaintext messages (e.g., where each part of a message may be spread at a character level, word level, or bit level). For example, a covert message may be spread over elements of a plaintext message using a cipher-key which states that every nth word of the plaintext message belongs to the covert message. While spreading words rather than characters may be easier for encoding and decoding, it has the drawback that potentially sensitive words of the covert message will appear overtly.
Semantic camouflage has certain advantages over other steganographic techniques such as null ciphers because of the simplicity in which it may be used to code and decode covert messages. Indeed, coding and decoding is easy enough that it can be used in real-time and thus in day-to-day conversations. Moreover, while coding and decoding rules of semantic camouflage will typically be agreed upon in advance, in some situations it is possible for the encoder to define simple camouflage rules without prior agreement, letting the recipient infer these rules from his knowledge of the situation (e.g., a bookie calls a long-time client and asks him if he wanted to play, the client may well infer that the word “play” is being used to mean “bet” even if that was not agreed upon in advance).
Thus, while semantic camouflage is related to steganographic techniques such as null ciphers, semantic camouflage involves transforming the meaning of a plaintext message rather than by spreading over it parts of a covert message. More specifically, the coding and decoding rules of semantic camouflage are defined using a set of semantic transformations that specify the translation between an overt linguistic unit and its covert meaning (e.g., “bet” is expressed as “play”). Such rules resemble encryption, in that they specify “how to code” rather than “where to hide” (e.g., the null cipher key). But unlike typical cryptographic encryption which transforms the form of a plaintext message, semantic camouflage involves transforming the meaning of a message. Thus, where usual encryption methods will encode two synonym words in completely different ways, semantic camouflage is expected to encode synonyms in the same way (e.g., “play” would be used to encode both “bet” and “gamble”).
Producing a good semantically camouflaged message is more subtle than simply transforming a series of individual words. Instead, it amounts to adopting one or more suitable metaphors in which events and situations are meaningfully transposed. Complex metaphors are not readily defined on-the-fly because they generally require agreement in advance. In addition, a metaphor may not always be used consistently because new events may occur that may not be readily described using the agreed upon metaphor. Because of these difficulties, the use of semantic camouflage is often imperfect.