1. Field
Embodiments of the present invention generally relate to the field of network security techniques. In particular, various embodiments relate to hidden data identification and methods for filtering media files that are embedded with malware, spam or sensitive information.
2. Description of the Related Art
A barcode is an optical machine-readable representation of data. Linear or one-dimensional (1D) barcodes represent data by varying the width of and spacing between parallel lines or rectangles. Two dimensions (2D) barcodes use dots, hexagons and other geometric patterns to represent data. A unit of a 2D barcode, such as a matrix barcode or Quick Response (QR) code, may represent more than 1K bytes of data depending upon the version and encoding employed. A QR code that encodes text, music, images, Uniform Resource Locators (URLs) and/or emails can be generated as an image file and transmitted through short message service (SMS) and/or multimedia messaging service (MMS) or via the Internet. Barcode reader utility software running on a computing device, such as a smart phone, may scan a barcode by a camera connected to or integrated within the computing device. The barcode reader decodes the encoded content and then may show the content.
Some barcode reader utility software may carry out further operations based on the type of encoded content. For example, when the encoded content includes or represents a URL, the barcode reader utility software may launch a web browser and open the URL directly (via URL redirection, for example, which allows QR codes to send metadata to existing applications on the device running the barcode reader utility software). It is convenient for a smart phone user to open a web page by scanning a barcode instead of typing in the URL manually. As such, QR codes have become more prevalent as part of product/service advertising strategies targeting mobile-phone users via mobile tagging. Personal information or business cards may also be encoded within 2D barcodes (e.g., QR codes) and can be printed out or transmitted through a network.
Other ways to embed hidden content in media files include digital watermarking and steganography. Digital watermarking is a kind of marker covertly embedded in a noise-tolerant signal such as audio or image data. Digital watermarks are only perceptible under certain conditions, i.e., after using some algorithm, and are otherwise imperceptible to human senses. Both steganography and digital watermarking employ steganographic techniques to embed data covertly in noisy signals but remain imperceptible to human senses. Digital watermarks may be used to verify the authenticity or integrity of the carrier signal or to show the identity of its owner. It is prominently used for tracing copyright infringements and for banknote authentication.
As media files may carry encoded and hidden data that are imperceptible to human senses, some malware use these techniques to intrude into user devices or transfer sensitive information. For example, a malicious web site may distribute a barcode that contains its URL to smart phone users and induce the users to scan or decode the barcode. After the smart phone decodes the barcode, the smart phone may launch its web browser and open the malicious website. The website may contain malware that can, among other things, gain access and/or control of the smart phone, disrupt operation of the smart phone and/or gather sensitive information stored on or entered into the smart phone (e.g., usernames and passwords entered into apps and/or websites).
In view of the foregoing, there exists a need for methods and system that can resist the spread of media files containing malware or sensitive information embedded therein in human imperceptible form.