Data Cache May Contain 2,800 Partly Undiscovered BreachesOrganizations Scramble After 80 Million Potentially Breached Records Surface
An analysis of a massive 8.8 GB trove of files containing usernames and plaintext passwords suggests hundreds of services may have experienced data breaches that haven't become public yet.
News site HackRead first reported that the file existed, after it appeared on a well-known hacking forum. Troy Hunt, an Australian breach expert, says that while some of the data appears to come from major breaches such as Dropbox and MySpace, much of it may be new.
All told, the data trove contained 3,000 files, but Hunt believes that 2,844 of the files may contain unique, fresh data from unknown breaches. "It's actually quite bad," he tells Information Security Media Group.
Pretty much everything I ever write or create:— Troy Hunt (@troyhunt) February 26, 2018
90% awesome feedback
1% angry and pissed off: https://t.co/HEW4NYVkfW
Millions of New Records
The 2,844 files contain 80.1 million unique email addresses, and Hunt checked how many were in his data breach notification service, Have I Been Pwned. Registered HIBP users receive an email if their email address appears in a breach.
Hunt found that about 63 percent of the unique email addresses contained in the files were already HIBP's database, leaving tens of millions of new records. "What it boiled down to was there was just a huge amount of data I'd never seen before," he says.
He loaded the data into HIBP and ended up sending 186,000 notifications, which amounts to nearly 10 percent of the service's 1.85 million subscribers. The breach notifications have caused huge concern from individuals whose email addresses are in the data.
"I woke to an exploding inbox," Hunt says.
Who Was Breached?
The 2,844 files are named in a way that sometimes seems to indicate the service from which the data originated. But Hunt has encountered a challenge: While some of the filenames have spellings that are similar to a real service, or a slight variation, others range far afield and it's not clear if any are indeed accurate. Hunt says he's still trying to trace the services from which the data may have been stolen.
And so are a lot of potentially breached organizations. Since releaing his Monday blog post alerting HIBP users to the addition of 80 million emails to his service, Hunt says companies have been reaching out, trying to figure out if they are affected.
Hunt has shared a complete list of the 2,844 filenames, in case users recognize them and might be able to help trace their source:
Figuring out if data contained in data dumps is accurate or where it came from remains a frequent problem for data breach sleuths. Stolen data is traded frequently on the web and mixed together, and many breaches that purport to be from one service actually turn out to be from several different ones that have been mashed together.
"Our data gets abused and manipulated and turned into so many different things by so many different parties, you've really got no idea what's out there, from where and what it's called," Hunt says. "Our stuff is just all over the place."
To get a sense of how accurate or inaccurate the data sets may be, Hunt says he focused on the file labeled "Dropbox." It appeared odd.
Dropbox's breach, which occurred around 2012, leaked 69 million accounts, including usernames and hashed passwords. The hashed passwords came in two variations: a set of SHA1 hashes, but without the added salt and a second group with bcrypt hashes with a work factor of eight. Neither would be easy for an attacker to reverse to a plaintext password (see Dropbox's Big, Bad, Belated Breach Notification).
"What it boiled down to was there was just a huge amount of data I'd never seen before."
The data trove's "Dropbox" file contains 18 million usernames plus plaintext passwords. Hunt says it's unlikely whoever created the list did any password cracking because that would have been far too difficult.
So Hunt looked at some of the bcrypt hashes from the original Dropbox data and then looked at the corresponding plaintext password from the data trove. He used a service that allowed him to covert the plaintext password to a bcrypt one, a procedure he humorously termed the "poor man's cracking service."
Result: Some of those passwords matched passwords contained in the original Dropbox data, which had been encrypted with bcrypt. But Hunt says it's still unlikely that whoever amassed the latest trove of breach data cracked the Dropbox passwords. Instead, whoever compiled the data likely matched passwords that were exposed or cracked in other breaches with the Dropbox usernames, Hunt says.
Some other data in the files, even if it was mislabeled in the filenames, overlaps with known breaches, Hunt says. For example, a file named "SGB.net.txt" contains data that appears to come from the so-called Lifeboat breach, which is a Minecraft-focused community. The accurate URL for the service is "lbsg.net."
Since HIBP sent out notifications and Hunt published his blog post, some users have confirmed their use of certain services, making it easier to match misnamed files with possible breaches. In the coming days, Hunt expects to confirm new breaches that organizations never publicized or which they may have yet to discover.