First published October 2009
Business Intelligence Associates, Inc.
39 Broadway, NYC, NY 10006
NTFS is structured so that there can be a physical separation of the data that comprises a file and the properties or metadata of the file. One side-effect of this is that when a file is hashed on NTFS, only the content of the file is hashed and not necessarily the properties or metadata. For a forensics investigator using the NSRL database to reduce the number of files to review in an investigation, this creates a simple way for someone to hide data if they store information in the properties or metadata of files in the NSRL.
1. Using the NSRL
It is well-known in the forensics community that one of the biggest challenges for the forensics investigator today and in the foreseeable future is the number of files that have to be reviewed. The best tool right now that a forensics investigator has to deal with this situation is the use of a database of “known” files that can be used to identify and then remove files from the review set. The authoritative database for this is the National Software Reference Library (NSRL) maintained by the National Institute of Standards and Technology (NIST) . Ideally the NSRL can be used to identify most of the system files and many of the software applications that exist on a hard drive. The use of a hash, whether MD5 or SHA-1, not only identifies these files, but also ensures that the files have not been tampered with. These files are considered known because the NSRL is created from software received directly from the manufacturer. For example, a user would not be able to hide one of his documents by changing its name to a system file or by adding content at the end of a known file. A forensics investigator is thus able to remove a large percentage of files from a hard drive and only concentrate on ones that the user created and/or edited. 
Steganography is the “art and science of writing hidden messages in such a way that no one, apart from the sender and intended recipient, suspects the existence of the message.” The word derives from Greek and literally means “concealed writing.” 
3. NTFS Metadata
There are three places where metadata for a NTFS file can exist: the Master File Table (MFT), an Alternate Data Stream (ADS), and inside the file . The metadata contained in the MFT is “envelope” metadata, such as modified, accessed, and created (MAC) dates. Then there is the metadata that is commonly referred to as Summary Properties or just Properties. This is the metadata that can be found when you right-click on a file and choose the Summary tab, revealing fields like: Title, Subject, and Author. Depending on the file, the Properties will be contained in an ADS or inside of the file. The Properties are contained inside the file for certain file formats that support this, such as Microsoft Office files or PDF.
There are two places where this is a problem: in the MFT and in an ADS. Obviously when the metadata is contained inside of the file, the hash will change and the problem does not exist.
4. Hiding in Properties
A file in NTFS is composed of streams. Only one of those streams is the actual file and it is only this stream that is hashed. When a file has Properties, NTFS often stores them in a separate stream that is associated with the file. The association between the file is noted in the MFT. As a separate stream on disk, the size of the stream can be as large as any other file, limited only by available disk space.
This stream, or ADS, is not hashed when the file is hashed because it is technically a different file. Thus, a user can put information inside of the Comments field of a known DLL and, if the forensics investigator relies on the NSRL, he will not find it.
A user, of course, could come up with more sophisticated techniques, such as spreading the information throughout the Properties of a number of files that he knew were part of the NSRL, possibly making the info contained in each one seem innocuous or erroneous if found. A simple program could be developed to read and write to these Properties in order to form a shadow file system in the Properties of known files.
5. Hiding in the MFT
A more subtle form of the problem exists using the metadata in the MFT. Manipulating the metadata in the MFT does not provide much space to work with, but a user could easily rewrite the MAC dates of a known file. A very simple program can be written that reads and writes the dates of all known files on a system. This program can be used to write dates that actually encode information for the user and could manage to store a large amount of information. The fact that the dates would be nonsensical would not occur to the forensic investigator because the files were not examined in any detail. It is even theoretically possible to create an encoding scheme that relied on dates that were not visibly erroneous. For instance, the information could just be encoded in the time portion of the dates. This is obviously an issue that extends beyond the use of the NSRL.
The principles discussed in this paper can certainly be applied to other hash sets and other file systems. I have limited my discussion to the NSRL and NTFS because they are the most popular and offer clear issues with the way that metadata is stored and its impact on hashing.
The forensics investigator who wishes to overcome the issue of Properties stored in an ADS has to check all known file matches for Properties and only remove files that do not have Properties. Without a proper reference to what should be contained in the Properties section of a file, the investigator cannot automatically determine except through a manual process whether the Properties are actually from the software manufacturer or not.
With regards to the issue of metadata in the MFT, it is not clear how to overcome this issue.
 “Using File Hashes to Reduce Forensic Analysis”, SC Magazine, May 2002, Dan Mares.
 For more info on NTFS from a forensics perspective, see File System Forensic Analysis by Brian Carrier.