Database of Software “Fingerprints” Expands to Include Computer Games

by Richard Press, NIST

One of the largest software libraries in the world just grew larger. The National Software Reference Library (NSRL), which archives copies of the world’s most widely installed software titles, has expanded to include computer game software from three popular PC gaming distribution platforms—Steam, Origin and Blizzard.

The NSRL, which is maintained by computer scientists at the National Institute of Standards and Technology (NIST), allows cybersecurity and forensics experts to keep track of the immense and ever-growing volume of software on the world’s computers, mobile phones and other digital devices. It is the largest publicly known collection of its kind in the world.

To people who work in cybersecurity and digital forensics, the world is a vast and ever-rising ocean of digital objects. NIST’s Reference Data Set—a list of more than 40 million hashes, or digital “fingerprints” of known software files—helps them quickly find what they’re looking for.
Credit: K. Irvine/NIST

The NSRL does not loan out the software in its collection. However, NIST runs every file in the NSRL through an algorithm that generates a digital “fingerprint”—a 60-character string of letters and numbers, also known as a hash, that uniquely identifies that file. Every quarter, NIST releases an updated list of hashes to the public. The list, which NIST calls the Reference Data Set, or RDS, can be freely downloaded from the agency’s website. The latest RDS contains more than 40 million hashes, including those for the recently added video game files.

To people who work in the fields of cybersecurity and digital forensics, the world is a vast and ever-rising ocean of digital objects. The RDS allows them to navigate that ocean and quickly find what they’re looking for.


Get The Latest DFIR News

Join the Forensic Focus newsletter for the best DFIR articles in your inbox every month.


Unsubscribe any time. We respect your privacy - read our privacy policy.

Many crimes today involve some form of digital evidence, and the NSRL helps investigators to process that evidence more quickly. If investigators have a seized hard drive or mobile phone, for instance, they can quickly hash all the files on that device, then compare that hash list to NIST’s RDS. All the files that match can be typically ignored because they are known software files that wouldn’t contain information relevant to the investigation.

“After they filter out all of the known files, they’re left with everything that’s not recognized,” said Doug White, the NIST computer scientist who runs the NSRL. “Those are the files that might be interesting.”

Digital forensic investigators at all levels of government and in private industry rely on the RDS to efficiently manage their caseload.

The NSRL contains operating system software, office software, media players, device drivers—all types of software files that are commonly installed on personal computers. In 2016, the NSRL expanded to include hundreds of thousands of mobile apps, which extended its usefulness to mobile phones.

The recent addition of gaming software to the NSRL reflects the growing popularity of that software category. “We’re not watching what gamers are doing,” White said. “But we need to include gaming software in the NSRL if we want to stay relevant.”

Among the video game titles added to the NSRL are “PlayerUnknown’s Battlegrounds,” “World of Warcraft” and “Mass Effect.”

“These games are insanely popular,” said Eric Trapnell, a NIST computer scientist who helped curate the collection and is a gamer in his spare time. “Some of them have install bases in the millions.”

Many of the titles were donated to the NSRL by Valve Software, which owns the Steam platform; Electronic Arts, which owns Origin; and Activision Blizzard, which owns Blizzard. Other titles were purchased if their install base was large enough to justify the expense. All titles in the NSRL are properly licensed and acquired.

While the NSRL exists primarily to support cybersecurity and law enforcement efforts, it is also considered a repository of culturally significant digital artifacts. While important books, films and audio recordings are preserved at the Library of Congress, the NSRL functions as a national software archive. Historians consider this important because most of modern culture is both produced and consumed using software.

“Think of all the PowerPoints and Word documents that have tremendous historical significance,” said Trevor Owens, head of Digital Content Management at the Library of Congress. He might have added digital artworks, maps and interactive media. “Those documents might be lost, if future historians don’t have access to a comprehensive collection of software.”

An earlier batch of video games was added to the NSRL two years ago, including first editions of “Mario Bros.,” “Asteroids” and “Sim City,” preserving these retro titles and associated artwork for posterity.

While law enforcement professionals and digital culture geeks might seem strange bedfellows, White says he’s not surprised by their shared interest in the software library. “We preserve the software and make the RDS available to the public,” White said. “The more people who find that useful, the better.”

This article was originally published on NIST.gov.

Leave a Comment

Latest Videos

This error message is only visible to WordPress admins

Important: No API Key Entered.

Many features are not available without adding an API Key. Please go to the YouTube Feeds settings page to add an API key after following these instructions.

Latest Articles