First published August 2005
by James E. Wingate, CISSP-ISSEP, CISM, IAM
Director, Steganography Analysis & Research Center
Chad W. Davis
Computer Security Engineer
Rapidly evolving computer and networking technology coupled with a dramatic expansion in communications and information exchange capability within government organizations, public and private corporations and even our own homes has made our world smaller. As a society, we are substantially more invested in information technologies than ever before. Use of the Internet and multimedia technologies for communication have become commonplace and have become an integral part of both business and social activity. This has changed how societies across the globe operate.
The rapid evolution of the Internet has also been somewhat of a “double-edged sword.” Not only has it provided a medium for exchanging vast amounts of information and knowledge for the benefit of mankind it has also provided a new medium for conducting activities detrimental to mankind. No longer confined to the bounds of physical space, criminals, including terrorists, have discovered a virtual world where they can take advantage of the vast expanse of cyber space to conceal their activities from the prying eyes of law enforcement and the intelligence community. In the pre-Internet era, criminals often operated under the cloak of darkness. Now they operate 24×7 under the cloak of cyber spaceâ€”with little concern for being detected, arrested, prosecuted and convicted because by and large much criminal activity goes unreported. Even when it is reported, law enforcement is already so overwhelmed with CP investigations they don’t have the time or resources to investigate other cyber crimes. This fact is not lost on those who would use the Internet for illegal or otherwise nefarious purposes.
To make matters worse, criminals are adapting to evolving law enforcement technologies in the field of computer forensics by finding new ways to conceal their criminal activities. Law enforcement forensic examiners are beginning to discover data hiding applications on seized media that have been used to evade detection by popular computer forensic tools by hiding a digital file inside of another digital file. This technique is called digital steganography.
Steganography, literally meaning “covered writing,” is a means of covert communication that encompasses a variety of techniques used to embed data within a cover medium in such a manner that the very existence of the embedded information is undetectable.
Hundreds of steganography applications are readily available on the Internet, and most of those are available as freeware or shareware, for use by criminals and terrorists. Computer security, law enforcement, and intelligence professionals need the capability to both detect the use of digital steganography applications to hide information and then extract the hidden information. Accordingly, there is much current interest in steganalysis, or the detection and extraction of information hidden with digital steganography applications.
There are two major schools of thought for conducting steganalysis: one of which involves an approach known as “blind detection” and the other is a more analytical approach. This document will describe both techniques and how they can be employed together to conduct steganalysis.
The Blind Detection Approach to Steganalysis
The blind detection approach to steganalysis has been around for a number of years. Blind detection attempts to determine if a message may be hidden in a file without any prior knowledge of the specific steganography application used to hide the information. Several techniques may be employed to inspect suspect files including various visual, structural, and statistical methods. Visual analysis methods attempt to detect the presence of steganography through visual inspection, either with the naked eye or with the assistance of automated processes. Visual inspection with the naked eye can succeed when steganography is inserted in relatively smooth areas with nearly equal pixel values. Automated computer processes can, for example, decompose an image into its individual bit-planes. A bit-plane consists of a single bit of memory for each pixel in an image, and is a typical storage place for information hidden by steganography applications. Any unusual appearance in the display of the least significant bit-plane would be expected to indicate the existence of steganography.
Structural analysis methods attempt to reveal alterations in the format of the data file. For example, a steganography application may append hidden information past an image’s end-of-file marker. An image that has been modified using this appending technique is interpreted by the operating system just as if it were the original carrier file. The two files are visually and digitally identical, because the image’s data bits have not been altered. The hidden information that is embedded past the end-of-file marker is simply ignored by the operating system. Several automated methods for conducting structural analysis have been developed in addition to the manual process of investigating images with a hex editor.
Statistical analysis methods attempt to detect tiny alterations in a file’s statistical behavior caused by steganographic embedding. Statistical analysis of files can be difficult and time consuming, since there are a variety of approaches to embeddingâ€”each modifying the carrier file in a different way. Therefore, unified techniques for detecting steganography using this method are difficult to find. Determining statistics such as means, variances, and chi-square tests can measure the amount of redundant information and/or deviation from the expected file characteristic. Current research in blind detection steganalysis is focused on these statistical methods.
Complications of Blind Detection
In practice, even if the blind detection technique detects anomalies in suspect files, it is not very likely that the hidden information can successfully be extracted. The suspect file may have characteristics similar to an anomaly that will trigger a false positive result. It is also important to keep in mind that even if it is possible to extract the hidden information, it may have been encrypted prior to being hidden in the carrier file. In that case, the hidden information extracted from the carrier file, if that is even possible, will be in the form of cipher text which may not be decipherable if a very strong encryption algorithm was used.
The following four complications are possible when implementing blind detection techniques for steganalysis:
1. The suspect file may or may not have any information hidden in it in the first place.
2. The hidden message may have been encrypted before being hidden in the carrier file.
3. Some suspect files may have had noise or irrelevant data encoded in them which reduces the stealth aspect (i.e., makes it easier to detect use of steganography) but makes analysis very time-consuming.
4. Unless the hidden information can be found, completely recovered, and decrypted (if encrypted), it is often not possible to be sure whether the suspect carrier file contained a hidden message in the first placeâ€”all you end up with is a probability that the suspect carrier file may have something hidden within it.
The Analytical Approach to Steganalysis
The analytical approach to steganalysis has been developed within the Steganography Analysis and Research Center (SARC) as a product of extensive research of steganography applications and the techniques they employ to embed hidden information within files.
The premise of this approach is to first determine if any residual file and/or Windows RegistryÂ® artifacts from a particular steganography application exist on the suspect media.
– If residual artifacts exist, then the application was probably installed
– If the application was installed, then it was probably used
– If the application was used, then it was used to hide something
And that is exactly what the computer forensics examiner must try to determine. What information was hidden? That may be the key to the investigation that resulted in the computer seizure in the first place.
The analytical approach attempts to determine if there is any evidence that a steganography application ever existed on the suspect media. Searching for files and registry entries that have been identified by the SARC as belonging to a steganography application will identify these residual artifacts.
The goal is to determine which steganography application was used. Determining the application used will shed light on the embedding technique employed by the application and the file types used by the application as carrier files. Armed with that knowledge, the examiner can then focus their efforts on detailed analysis of suspect carrier files and attempt to extract information that may have been hidden in those files.
Process for Analytical Steganalysis
The analytical approach to steganalysis is intended to be an extension of traditional digital forensics methods. For example, traditional methods should be employed to recover all files that may have been deleted prior to beginning the steganalysis aspect of the examination.
Determining Residual File Artifacts
To determine if residual file artifacts of steganography applications exist on the suspect media, the SARC has developed the Steganography Application Fingerprint Database (SAFDB). The SAFDB contains hash values for nearly 15,000 file profiles associated with 230 steganography, watermarking, and other data-hiding applications. The file profiles contain identifying information such as filename, associated application name, and four unique hash values: CRC-32, MD5, SHA-1, and SHA-256. These hash values may be used to determine the presence of a steganography application or artifact of a steganography application on the media being examined.
For a limited time, the SAFDB is available at no charge to authorized law enforcement and intelligence community examiners on the SARC website at http://www.sarc-wv.com. The database is available in formats compatible with most popular forensic tools: EnCase, FTK, HashKeeper, ILook, and ProDiscover. For additional information on SAFDB, please contact the SARC to request the free White Paper on “The Steganography Application Fingerprint Database.”
The first step in the analytical approach is to hash all files on the suspect media. Next, the hash values are compared with the hash values in SAFDB. A match represents a file artifact that may be associated with one or more steganography applications. Each file profile within the SAFDB identifies which steganography application that artifact belongs to.
Determining Residual Registry Artifacts
In addition to the hash values of files associated with steganography applications, the SAFDB also contains a set of registry keys and values known to be created or modified as a result of installing a steganography application. This aspect of the analytical approach is not unlike searching for latent fingerprints at a crime scene. Some criminals go to great lengths to cover their tracks by wearing gloves and/or cleaning up a crime scene. Likewise, some cyber criminals will go to great lengths to cover their tracks. After using a steganography application, they may uninstall the application and then delete obvious folders and files associated with the application that weren’t removed by the uninstall operation.
The registry keys and values can be compared to the registry from the suspect computer to determine if a steganography application currently exists, or did exist at one time, on the system. A positive match could lead the investigator to confirm with a high degree of confidence that a particular steganography application has existed on a suspect system.
It is entirely conceivable that a single registry key or value could be the sole fingerprint left behind that could become the key to finding and extracting information hidden with a steganography application deleted from the system after it was used.
Conducting Analytical Steganalysis
After determining which steganography application(s) may have been used, carrier file types that can be manipulated by those applications should be identified. To determine the potential carrier file types for a steganography application, the examiner should download and experiment with that application. The SARC maintains a physical repository of each steganography application that exists in the SAFDB and may be contacted for assistance if the examiner cannot locate a particular steganography application on the Internet. Copies of commercially licensed versions of steganography applications cannot be provided.
Next, a focused search should be conducted on the suspect media for carrier file types that are manipulated by the particular steganography application. Finally, the suspect carrier files can be subjected to further analysis based on the specific steganographic techniques that can be used on them.
After determining which steganography technique was employed by the application detected on the suspect media, efforts to extract information hidden with that application can begin. Again, if strong encryption was used prior to hiding the information in the carrier file, then complex cryptanalysis may be necessary to translate the extracted cipher text back into plain text.
Some steganography applications leave behind signatures, specific byte patterns that always appear in a file after hidden information has been embedded. Signature-based steganalysis can be very time consuming because the signature for a specific steganography application must first be identified from a large sample of files that have been embedded using it. In addition, automated processes must be employed to search every potential carrier file for that particular signature.
An automated artifact detection tool, StegAlyzerAS (Steganography Analyzer Artifact Scanner), has been developed to detect file and registry artifact matches with SAFDB.
An automated signature-based detection tool that uses a proprietary steganography application signature database, StegAlyzerSS (Steganography Analyzer Signature Scanner), has also been developed.
These products were designed and developed to alleviate the very complex and time consuming efforts that a computer forensics examiner must endure during an investigation involving steganography.
In addition to the StegAlyzer products, computer forensic examiners can also contact the SARC for technical assistance when steganography is detected during an examination of suspect media.
Carrier File Types and Steganographic Techniques
The following sections will demonstrate commonly used steganographic techniques for different carrier file formats. Examples will be given for each file type and steganographic technique, including methods for detecting and extracting hidden information embedded using each technique.
All Files – The Append Technique
A commonly used steganographic technique that can be applied to any type of file is the appending method. This method appends the hidden information past the file’s end-of-file marker. The hidden information can be encrypted, compressed within a zip file, or left in plaintext. The appended information may also contain a signature for the steganography application that embedded it, the size of the hidden information, or the size of the original carrier file.
To illustrate the append steganographic technique, consider the following JPG image: baboon.jpg.
The JPG image format dictates that the byte sequence FF D9 indicates the end of the file. This can be seen by opening the file in a hex editor.
Hex editor view of baboon.jpg
A steganography application that uses the appending technique is used to hide a text file containing the Declaration of Independence in the baboon image. This particular application compresses the Declaration of Independence file into a standard zip file and appends it past the FF D9 end of image marker for JPG images. The ZIP file format dictates that the byte sequence 50 4B indicates the beginning of the file. This can also be seen with a hex editor.
Hex editor view of baboon.jpg with embedded Declaration of Independence.txt
This particular steganography application also embeds additional data used for decoding the hidden information. This data includes signature bytes that the steganography application uses to identify that the hidden information was embedded by itself (denoted by the red box in the diagram below), and a hash value representation of the user’s specified password (denoted by the green box).
Hex editor view of information appended by the steganography application
Extracting hidden information that has been embedded using the append steganographic technique involves identifying the end-of-file bytes of the original file and the hidden information that follows. Using a hex editor, the first step in extraction is to remove all of the original carrier file’s bytes. The bytes that remain contain the hidden information. The hidden information may be readable plaintext, encrypted, or compressed. If the hidden information is encrypted by a strong cipher, it may be difficult or even impossible to retrieve the deciphered hidden data. If the hidden information is compressed within a compressed ZIP file, the byte sequence 50 4B will denote its first two bytes. To recover the decompressed hidden data, first recover all bytes corresponding to the ZIP file and save them as a separate file. Try to extract the compressed file using WinZip or another decompression tool.
The SARC has developed in-house tools for extracting information embedded by the append steganographic technique. If you are interested in steganalysis services, please contact the SARC at (304) 366-9161 or firstname.lastname@example.org for further details.
BMP – The Least Significant Bit Technique
A commonly used steganographic technique that can be applied to BMP graphic files is the Least Significant Bit (LSB) method. As its name implies, the LSB method replaces the least significant bit in the data bytes of the image to embed the hidden information. These bit changes do not cause significant quality degradation in the image, especially for 24-bit BMP files. Sometimes, a steganography application can use the least two significant bits in the bytes to embed the hidden information.
To illustrate the LSB steganographic technique, consider the following BMP image: house.bmp.
The LSB steganographic technique encodes messages in the least significant bit of every byte in an image. By doing so, the value of each pixel is changed slightly, but not enough to make significant visual changes to the image, even when compared to the original. Comparing the original carrier file with the same file that has been manipulated by the LSB technique in a hex editor shows a variance in some byte values. Notice in the figure below that the highlighted byte values differ in value by one.
Hex editor comparison
house.bmp (without steganography)
house.bmp (with steganography)
This manual inspection of files is not practical in most digital forensics investigations, since it is not likely that both a clean carrier file will exist along with the carrier file with steganography embedded within it. A more effective approach to LSB analysis is to conduct LSB enhancement. This technique “enhances” image pixel bytes by setting the value of all bits within each byte to the value of the least significant bit. For example, consider the byte 4B. The bitwise representation of 4B is 01001011. LSB enhancement sets all bits to 1, the value of the least significant bit. The resulting byte value of FF replaces the original byte value of 4B.
The images below are LSB enhancements of the house image. Notice that the image containing steganography has a lattice pattern at the bottom. This pattern is a telltale sign that ASCII text has been embedded using the LSB technique.
Enhanced house.bmp (without steganography)
Enhanced house.bmp (with steganography)
The LSB steganographic technique can also be implemented to modify any number of least significant bits. For example, an application may modify the least two significant bits to hide information. The greater the number of bits an application modifies, the greater the reduction of picture quality and chance for visual attack.
Extracting hidden information that has been embedded using the LSB technique involves determining the number of bits used for encoding. After extracting the encoding bits, they must be reassembled to create the hidden information. Some steganography applications employ various randomization techniques for reassembling the encoded bits. For straightforward embedding, simply reconstruct eight bits into each byte of the hidden data.
Criminals have always sought ways to conceal their activity in real, or physical, space. The same is true in virtual, or cyber space. Digital steganography represents a particularly significant threat today because of the large number of digital steganography applications freely available on the Internet that can be used to hide any digital file inside of another digital file. Use of these applications, which are both easy to obtain and simple to use, allows criminals to conceal their activities in cyber space.
Thus, steganography presents a significant challenge to law enforcement as well as the intelligence community because detecting hidden information and then extracting that information is very difficult and may be impossible in some cases.
By providing a national repository of steganography application hash values, or fingerprints, and developing tools, techniques, and procedures to detect fingerprints and signatures on suspect media and then find and extract hidden information, the SARC is rapidly evolving into a high-value law enforcement, homeland security, and national security asset in the global war on terrorism and efforts to combat cyber crime.
James E. Wingate, CISSP-ISSEP, CISM, IAM
Director, Steganography Analysis & Research Center
Vice President for West Virginia Operations
320 Adams Street, Suite 105
Fairmont, WV 26554
Office: 304.366.9161 Fax: 304.366.9163
Chad W. Davis
Computer Security Engineer
320 Adams Street, Suite 105
Fairmont, WV 26554
Steganography Analysis and Research Center