Windows Search forensics

Analyzing the Windows (Desktop) Search Extensible Storage Engine database

by Joachim Metz
jbmetz@users.sourceforge.net

Summary

While some may curse Windows Vista for all its changes, for us forensic investigators it also introduced new interesting ‘features’. One is the integration of Windows (Desktop) Search into the operating system. Most corporations have been reluctant to adopt Vista, however more and more Windows XP systems are being replaced by Windows 7 equivalents. Windows 7 also contains Windows Search and enables it by default. It actually can be challenging to disable it so one can conclude that Windows Search is becoming a relevant source of information in forensic analysis of Windows systems.

What is not widely known is that Windows Search uses the Extensible Storage Engine (ESE) to store its data. This is the same engine that Microsoft Exchange uses. Because ESE uses a propriety database format, little information about it is available in the public domain. As a consequence, it is unclear how well different forensic tools support the ESE database format.


Get The Latest DFIR News

Join the Forensic Focus newsletter for the best DFIR articles in your inbox every month.


Unsubscribe any time. We respect your privacy - read our privacy policy.

Several years after the introduction of Windows Vista and Windows Search, currently only a handful of forensic analysis tools seem to provide support for the Windows Search database even though a Windows Search database can be a valuable source of evidence. This paper provides an overview of the ESE database format and the Windows Search database and what it might contribute in your investigations.

Background

Although the Extensible Storage Engine (ESE) is a generic database engine, forensic analysis of ESE databases seem to be centered around Exchange. Little information about forensic investigation of ESE databases in general, seem to have been published in the public domain. As far as I can tell, Mark Woan author of EseDbViewer, was one of the first who published information about forensic analysis of ESE databases in general. This was in 2008.

Early 2009, I was getting search results in Windows.edb files (Windows Search databases) on Windows XP system in some investigations. Neither EnCase or FTK seem to offer any support for this file, although they claim to have EDB support. Not many other tooling seemed to be available to analyze the Windows Search ESE database. However when investigation Windows Vista system the Widows.edb file no longer contained any relevant results.

Besides trying to verify my assumptions on the Exchange related parts in the Microsoft Exchange OST files, this triggered me to start working on the ESE database format. I therefore started the libesedb project in September 2009. Findings from the libesedb projects and some of Mark Woan’s EseDbViewer have been integrated in this document.

Table of Contents

1. Overview of the ESE database format
1.1. Database header
1.2. Page based storage
1.3. Database tables and indexes
2. Analysis of a Windows Search database
2.1. Data obfuscation
2.2. Data compression
2.3. Investigative artifacts and usefulness
2.4. The Vista welcome mail
3. Conclusion
Appendix A. References
Appendix B. GNU Free Documentation License

1. Overview of the ESE database format

The Extensible Storage Engine (ESE) database format is mainly known for its use in the Microsoft Exchange, i.e. for the priv1.edb file. What is less widely known that a lot of Microsoft products use this file format, some of which are Active Directory (ntds.dit), Windows (Desktop) Search (Windows.edb) and Windows Mail (WindowsMail.MSMessageStore).

ESE is also known as Jet Blue in contrast to Jet Red that refers to the Microsoft Access database format. Microsoft has kept the specification of ESE database format closed, although the Jet Blue API has been partially documented on MSDN. The information in this document was obtained by the information available on the Internet and reverse engineering of the file format. The information obtained is maintained in a working documented titled: the Extensible Storage Engine (ESE) database (DB) format specification [ESEDB09].

There are three main variants of the ESE, one for Exchange 5.5 (ESE97), one for Exchange 2000 and later (ESE98) and one for Windows NT and later (ESENT). Active Directory and Windows Search use the ESENT version.

Basically an ESE database consists of the following elements:

• database header and a backup
• pages containing:
• space tree data
• database table data
• database index data
• long value data

The following paragraphs provide an overview of some of these elements.

1.1. Database header

The ESE database starts with a database header. The effective size of the database header is at least 667 bytes of size, e.g. the first 16 bytes.

00000000: 5c ca 88 0b ef cd ab 89 20 06 00 00 00 00 00 00 \....... .......

Bytes 4 to 8 of the database header contain the unique signature ‘\xef\xcd\xab\x89’ of the ESEDB format. Other significant values in the header are the file type, format version and revision and page size. The database header is actually stored in a block the size of a page; which is directly followed by another block containing a backup of the database header. This is one of the data redundancy measures provided in the ESE database format.

Different versions of Windows NT use different revisions of ESE, e.g. Windows XP uses version 0x620 revision 9, Windows Vista uses version 0x620 revision 12 and Windows 7 uses version 0x620 revision 17. Different revisions can have different methods of storing data, e.g. the Windows 7 version of ESE allows for ‘native’ compression of data; in previous versions applications using ESE needed to do compression themselves, like the (RTF) LZFu compression used by Exchange.

When no measures are taken to detect and handle compressed data, linear search and index-based search techniques will fail. So these techniques do not suffice for finding all the strings in ESE databases.

The ESE database format is also used for streaming file, e.g. priv1.stm used by Exchange, however until now little is know about the specifics of these streaming files. ESE uses transaction logs, which in theory could be used to analyze different versions of the data and mutations. However version analysis currently is in a state of infancy.

ESE comes with the eseutil (or its equivalent esentutl). Eseutil can be used to print the database header of an ESE database. The following example prints the database header of a Windows Vista Search (Windows.edb).

eseutil.exe /mh Windows.edb

Initiating FILE DUMP mode...
Database: Windows.edb
File Type: Database
Format ulMagic: 0x89abcdef
Engine ulMagic: 0x89abcdef
Format ulVersion: 0x620,12
Engine ulVersion: 0x620,12
Created ulVersion: 0x620,12

Sometimes you can come across a ‘dirty’ database. This is a database that was not neatly closed. The following information in the header information will indicate if an ESE database is considered ‘dirty’.

State: Dirty Shutdown

A ‘dirty’ database can be repaired using the repair option in eseutil.

eseutil.exe /r Windows.edb

Repairing an ESE database will alter the database file, but might be necessary for tools that cannot open ‘dirty’ databases. Sometimes it is also necessary repair before eseutil can perform certain operations on ‘dirty’ databases. Note that a successful repair is not guaranteed. Libesedb [ESEDB09] will try to open the database in its ‘dirty’ state.

1.2. Page based storage

At the lowest level an ESE database stores its data in pages. The size of the pages is stored in the database header and is applied to the entire database. A single page consists of a header, values and an index. A page does not need to be entirely filled, therefore a page has ‘page unallocated space’ which can contain remnant data. This remnant data can be of interest for forensic analysis.

A feature of impact on this remnant data is ‘ESE (page) zeroing’ which overwrites unused pages with various byte values. The ‘zeroing’ can be performed manually, by eseutil, or automatically, during online backup. For Exchange online backup is controlled by the following Registry key.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MSExchangeIS\ParametersSystem\Zero Database During Backup

Currently the actual impact of ESE (page) zeroing for forensic investigations is unknown.

As of Windows Vista Seach, a page can contain an error correcting code (ECC). The Microsoft documentation states these ECC can only recover single-bit errors. The actual ECC method is not documented. In Windows 7 three additional ECCs were added, which probably allows for multibit recovery. This is another data redundancy measure provided in the ESE database format. Note that libesedb currently does not corrects errors using ECCs.

A page can contain multiple page values. Eseutil can be used to print the page values in page. The following example prints the values in page 13 of a Windows Vista Search ESE database (Windows.edb).

eseutil.exe /m /p13 Windows.edb

Initiating FILE DUMP mode...
Database: Windows.edb

Page: 13

expected checksum = 0x5c54a3ab36656192
new checksum format
expected ECC checksum = 0x5c54a3ab
expected XOR checksum = 0x36656192

checksum <0x00FE0000, 8>: 6653122505280414098
(0x5c54a3ab36656192)
dbtimeDirtied <0x00FE0008, 8>: 4646
(0x0000000000001226)
pgnoPrev <0x00FE0010, 4>: 0 (0x00000000)
pgnoNext <0x00FE0014, 4>: 14 (0x0000000e)
objidFDP <0x00FE0018, 4>: 2 (0x00000002)
cbFree <0x00FE001C, 2>: 3636 (0x0e34)
cbUncommittedFree <0x00FE001E, 2>: 0 (0x0000)
ibMicFree <0x00FE0020, 2>: 5151 (0x141f)
itagMicFree <0x00FE0022, 2>: 74 (0x004a)
fFlags <0x00FE0024, 4>: 10242 (0x00002802)
Leaf page
Primary page
New record format
New checksum format

TAG 0 cb:0x000d ib:0x0000 offset:0x0028-0x0034 flags:0x0000
TAG 1 cb:0x0037 ib:0x000d offset:0x0035-0x006b flags:0x0004 (c)
TAG 2 cb:0x0033 ib:0x0044 offset:0x006c-0x009e flags:0x0006 (cd)
...
TAG 73 cb:0x0057 ib:0x1025 offset:0x104d-0x10a3 flags:0x0005 (cv)

First the information about the page header is provided followed by locations of the page values. Each page value is defined a tag (or index entry) and controlled by three flags, which are identified by the characters c, d and v. The actual meaning of the flags is undocumented but the dflag seems to be used for deleted or defunct values. These deleted values are not overwritten and therefore can be interesting from an investigative point-of-view.

Eseutil does not provide means to access the data in the page values, except for some database metadata tables, like the catalog and the space tree.

1.3. Database tables and indexes

The definition of the database tables and indexes are stored in a table referred to as the catalog.

The name of this table is ‘MSysObjects’. Each ESE database contains a catalog and its backup named ”MSysObjectsShadow’.

The data of tables and indexes are stored in a hierarchy of pages (or page-tree). These page-trees are traversed by means of (page) keys.

Eseutil can be used to print table information, e.g. the table information of the ‘MSysObjects’ table of a Windows Vista Search ESE database (Windows.edb).

eseutil.exe /mm /tMSysObjects Windows.edb

Initiating FILE DUMP mode...
Database: Windows.edb
******************************* META-DATA DUMP *******************************
Name Type ObjidFDP PgnoFDP
==============================================================================
Windows.edb Db 1 1
MSysObjects Tbl 2 4
Name Idx 4 7
RootObjects Idx 5 10
******************************************************************************

From the output we can learn that the ‘MSysObjects’ table has two corresponding indexes: ‘Name’ and ‘RootObjects’. ‘ObjidFDP’ refers to an unique ‘object’ identifier for each table or index. ‘PgnoFDP’ contains the page number of the Father Data Page (FDP), which basically is the root page of the page-tree.

Eseutil can be used to print all the tables and indexes of the database.

eseutil.exe /mm Windows.edb

The libesedb project comes with a tool called esedbinfo which does a similar print all of the tables and indexes in the database.

For some tables eseutil will print a line containing ‘Long Values’.

SystemIndex_0A Tbl 21 1125
<Long Values> LV 261 743

Long values are used by ESE to store ‘large’ amount of data in a separate page values; in effect also a separate page-tree. According to [MSDN]:

ESE stores the long value separated if it is larger than 1024 bytes or if the record would not fit on a single database page if stored in record.

2. Analysis of a Windows Search database

Windows Search stores its data in a file named:

%Profiles%/All Users/Application Data/Microsoft/Search/Data/Applications/Windows/Windows.edb

Note that ‘%Profile%’ is dependent on the Windows version. To access the Windows.edb file the the Windows Search service needs to be deactivated and the necessary access rights are required. If the database is in a ‘dirty’ state it might be necessary to copy the transaction logs as well. According to Mark Woan, author of EseDbViewer, copying the entire Windows Search application directory often does the trick.

Access to the ESE database format is only a small step closer to the information in a Windows Search database. As far as I know, forensic tools like EnCase or Forensic Toolkit do not support the Windows Search database; although they support some types of ESE databases. Additional specialized investigative tools like Windows Search Index Examiner or EseDbViewer are necessary; at least EseDbViewer directly uses the ESE. You could also consider to write a tool for a quick-and-dirty export of the values in the tables using ESE yourself.

From a forensic point of view using ESE is not the preferred method, because the engine alters the data; at minimum ESE sets the database state to ‘dirty’. However ignoring possible evidence is not an option either. Another issue is that ESE will not open ‘dirty’ databases.

The approach of exporting data directly from a Windows XP Search database works fairly well. However when it comes to Windows Vista you’re out of luck. Most of the columns have changed from the text to a binary format. Also the binary data in these columns is no longer readable; they have been compressed and obfuscated. Windows 7 Search uses native ESE compression and has largely switched back to text columns again.

One of the more interesting columns ‘System_Search_AutoSummary’, which contains part of the content of an indexed item, is compressed and obfuscated in the XP, Vista and 7 versions of Windows Search.

2.1. Data obfuscation

According to [TECHNET]:

Index files are lightly obfuscated.

If the obfuscation is removed, meaningful data from documents can be extracted. The data structures of the index files do not lend themselves to easy reconstruction of a complete document. However, someone with enough tenacity and time could reconstruct the text for the majority of a document.

Actually the obfuscation method is fairly straight forward. The obfuscation method uses a XOR with a bitmask based on the location of the byte in the data and an initial 32-bit bitmask.

The initial bitmask is created by a 32-bit XOR of the values in the Windows NT security identifier (SID):

S-1-5-12

The SID is stored as the following byte values:

01 01 00 00 00 00 00 05 12 00 00 00

This results in a 32-bit bitmask of:

0x05000113

The data is obfuscated using a method similar to the one below.

bitmask32 = 0x05000113;

bitmask32 ^= (uint32_t) encoded_data_size;

for( encoded_data_iterator = 0;
encoded_data_iterator < encoded_data_size;
encoded_data_iterator++ )
{
switch( encoded_data_iterator & 0x03 )
{
case 3:
bitmask = (uint8_t) ( ( bitmask32 >> 24 ) & 0xff );
break;
case 2:
bitmask = (uint8_t) ( ( bitmask32 >> 16 ) & 0xff );
break;
case 1:
bitmask = (uint8_t) ( ( bitmask32 >> 8 ) & 0xff );
break;
default:
bitmask = (uint8_t) ( bitmask32 & 0xff );
break;
}
bitmask ^= encoded_data_iterator;

data[ data_iterator++ ] = encoded_data[ encoded_data_iterator ] ^ bitmask;
}

2.2. Data compression

Windows Search compresses the data before obfuscating it. For this it uses multiple compression methods. All these compression methods and obfuscation correction are incorporated in the function ‘MSSUncompressText’ stored in a Windows Search specific DLL. The name of the DLL differs per version of Windows Search. A quick-and-dirty approach could be to call the function directly to decompress the binary data.

Some of the obfuscation correction and decompression techniques have been integrated into esedbexport which is included in libesedb project [ESEDB09]. For a Windows Search database esedbexport tries to convert the compressed values it knows about. Note that the libesedb project is still in alpha status and you might want to validate findings, if possible, with other tools.

2.3. Investigative artifacts and usefulness

So what makes the Windows Search database so interesting for forensic analysis? For starters the Windows Search database contains a table named ‘SystemIndex_0A’ which contains vast amount

of values about various of artifacts found on a Windows system, e.g. files and directories, emails, appointments, attachments, images, audio and video, Microsoft Internet Explorer (MSIE) history, etc.

Better yet, on Windows Vista and 7, Windows Search is activated by default running as a system service, silently collecting this data on the background. Most users will be totally unaware that Windows Search is actually indexing potential evidence; talk about a system ready for investigation.

A Windows Search database can contain metadata and partial content data of deleted files. For now it is unknown how long Windows Search retains its data. From personal experience I can say that a Windows Search database on my test system still contained metadata about a file I thought I had thoroughly erased from that system a half year before.

Windows Search also can index items from other sources like an Exchange sever; yet another location to find (deleted) emails.

2.4. The Vista welcome mail

To give an idea of the values in a Windows Search database consider the Windows Vista Mail welcome email message.

 

(Please do not reply to this message)

WELCOME TO WINDOWS MAIL

Windows Mail is the successor to Outlook Express

Windows Mail builds on the foundation of Outlook Express, adding a variety of
new features designed to make your e-mail experience more productive and fun,
while helping to reduce risks and annoyances such as phishing and junk e-mail.

GETTING STARTED
If you're upgrading from Outlook Express, Windows Mail can import your
existing account information and e-mail addresses. The first time you start
Windows Mail, you will be prompted to set up an e-mail account. If you skip
this step and want to set up a new account later, click the Tools menu, click
Accounts, and then click Add.

In addition to sending and receiving e-mail, you can use Windows Mail to read
newsgroups, which are Internet discussion forums where groups of people gather
to talk about common interests. To participate in a newsgroup (you can send a
message or just read what other people are talking about), click Microsoft
Communities in the folder pane. You can choose from a variety of newsgroups
devoted to Windows and other Microsoft products.

To get help using Windows Mail, click the Help menu, and then click View Help.
You can also get help from other Windows Mail users in the
microsoft.public.windows.vista.mail newsgroup.

NEW FEATURES

Improved e-mail searching
* To quickly search your messages in Windows Mail, you can type complete or
partial words into the search box. You'll instantly get a list of all of the
messages that contain those words. The list of results will show messages that
contain your search criteria in both the headers and message text of your
mail messages.
* For fast access to search, press CTRL+E to move the cursor into the search
box. Press ESC to clear the search box.
* You can also search your e-mail inbox from Windows by using the search box.
Searching from Windows instead of Windows Mail will produce the same results:
matches are based on both the headers and message text of the mail in your
inbox.

Junk e-mail and phishing filters
* Windows Mail now includes Microsoft SmartScreen technology to help keep
unwanted junk e-mail out of your Inbox. Suspected junk e-mail messages are
automatically moved to the Junk E-mail folder.
* The anti-phishing features in Windows Mail help protect against phishing
messages, which attempt to trick you into revealing personal or financial
information. When Windows Mail detects a possible phishing message, it allows
you to view the message, but it blocks any links or dangerous content that
might be in the message. You can choose to delete a message, or to allow a
message that you know is safe.

Communities
* Windows Mail Communities let you rate the usefulness of newsgroup messages
by clicking the Rate this Post button. This makes it easier and faster to find
helpful, trusted information in busy newsgroups.
* The Communities rating feature uses Windows Live ID to help ensure that the
people who post messages in newsgroups are who they claim to be. (You can
still utilize the Communities feature without using Windows Live ID.)

ABOUT NEWSGROUPS
Windows Mail is about more than just e-mail. You can use Windows Mail to
access Microsoft's Help newsgroups at msnews.microsoft.com by clicking
Microsoft Communities in the folder pane. These newsgroups allow you to ask
questions and read answers from other people who are also using Microsoft
products.

What you should know before you get started

1. Find the appropriate group. You'll find newsgroups covering most Microsoft
products. Picking the appropriate newsgroup is the best way to receive the
information you want. Select folders related to the product that you have
questions about. For example, the group "microsoft.public.powerpoint" would be
the plac

 

As you can see metadata and part of the content of the Welcome email have been stored in the Windows Search database.

3. Conclusion

In short Windows Search can be a valuable source of investigative information and as of Vista it is available by default.

Windows Search uses the Extensible Storage Engine (ESE) database format to store its data. Although the ESE database format is complex and still evolving, the means to access ESE databases are readily available on a Windows system.

Windows Search uses both compression and obfuscation. Therefore investigative methods like linear and index-based searches will fail unless a tool has support for the Windows Search database, which currently not many investigative tools seem to have. The compression and obfuscation can be easily taken care of by using Windows Search own decompression function. The next time you’re analyzing a Windows system have a look at the Windows Search database, perhaps it will help you in solving your case.

License

Copyright (c) 2010 Joachim Metz . Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts and with no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License”.

Appendix A. References

[ESEDB09] Title: Extensible Storage Engine (ESE) database (DB)
Author(s): Joachim Metz
URL: https://libesedb.sourceforge.net/

[MSDN] Title: Microsoft Developer Network
URL: http://msdn.microsoft.com/

[TECHNET] Title: Windows Indexing Features
URL: http://technet.microsoft.com/enus/
library/dd744700%28WS.10%29.aspx#WS_IndexingOutlookandExchange

[WOAN08] Title: EseDbViewer
Author(s): Mark Woan
URL: http://www.woanware.co.uk/esedbviewer

 

Appendix B. GNU Free Documentation License

Copyright (c) 2010 Joachim Metz. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts and with no Back-Cover Texts. A copy of the license is available here.

1 thought on “Windows Search forensics”

  1. Very nicely documented Joachim.

    I also did similar research on this about a year and half back, and presented it at CEIC 2010 along with a .NET tool to extract and unobfuscate the text. I just read your paper on this, you have reverse engineered the scheme too, excellent work. I was being lazy, I used .NET PInvoke to load the DLL itself to do the process.

    Regards
    Yogesh Khatri

Leave a Comment

Latest Videos

Digital Forensics News Round Up, March 27 2024 #dfir #digitalforensics

Forensic Focus 27th March 2024 6:06 pm

Digital Forensics News Round-Up, March 21 2024 #digitalforensics #dfir

Forensic Focus 21st March 2024 6:15 pm

This error message is only visible to WordPress admins

Important: No API Key Entered.

Many features are not available without adding an API Key. Please go to the YouTube Feeds settings page to add an API key after following these instructions.

Latest Articles