A database containing more than 267 million Facebook user IDs, phone numbers, and names was left exposed on the web for anyone to access without a password or any other authentication.


Comparitech partnered with security researcher Bob Diachenko to uncover the Elasticsearch cluster. Diachenko believes the trove of data is most likely the result of an illegal scraping operation or Facebook API abuse by criminals in Vietnam, according to the evidence.


The information contained in the database could be used to conduct large-scale SMS spam and phishing campaigns, among other threats to end users.


Diachenko immediately notified the internet service provider managing the IP address of the server so that access could be removed. However, Diachenko says the data was also posted to a hacker forum as a download.


Timeline of the exposure


The database was exposed for nearly two weeks before access was removed. Here’s what we know:

  • December 4 – The database was first indexed.
  • December 12 – The data was posted as a download on a hacker forum.
  • December 14 – Diachenko discovered the database and immediately sent an abuse report to the ISP managing the IP address of the server.
  • December 19 – The database is now unavailable.


Typically, when we find exposed personal data like this, we take steps to notify the owner of the database. But because we believe this data belongs to a criminal organization, Diachenko went straight to the ISP.


What data was exposed


In total 267,140,436 records were exposed. Most of the affected users were from the United States. Diachenko says all of them seem to be valid. Each contained:

  • A unique Facebook ID
  • A phone number
  • A full name
  • A timestamp


The server included a landing page with a login dashboard and welcome note.


Facebook IDs are unique, public numbers associated with specific accounts, which can be used to discern an account’s username and other profile info.


Facebook scraping


How criminals obtained the user IDs and phone numbers isn’t entirely clear. One possibility is that the data was stolen from Facebook’s developer API before the company restricted access to phone numbers in 2018. Facebook’s API is used by app developers to add social context to their applications by accessing users’ profiles, friends list, groups, photos, and event data. Phone numbers were available to third-party developers prior to 2018.


Diachenko says Facebook’s API could also have a security hole that would allow criminals to access user IDs and phone numbers even after access was restricted.


Another possibility is that the data was stolen without using the Facebook API at all, and instead scraped from publicly visible profile pages.


“Scraping” is a term used to describe a process in which automated bots quickly sift through large numbers of web pages, copying data from each one into a database. It’s difficult for Facebook and other social media sites to prevent scraping because they often cannot tell the difference between a legitimate user and a bot. Scraping is against Facebook’s - and most other social networks’ - terms of service.


Many people have their Facebook profile visibility settings set to public, which makes scraping them trivial.


This isn’t the first time such a database has been exposed. In September 2019, 419 million records across several databases were exposed. These also included phone numbers and Facebook IDs.


Dangers of exposed data


A database this big is likely to be used for phishing and spam, particularly via SMS. Facebook users should be on the lookout for suspicious text messages. Even if the sender knows your name or some basic information about you, be skeptical of any unsolicited messages.


How and why we discovered this data


Comparitech works with Bob Diachenko to uncover unsecured databases and report them to the public. Our aim is to limit access to and abuse of personal data by malicious parties, and to raise awareness among those affected about the potential risks.


Upon discovering exposed data, Diachenko immediately notifies those responsible so the database can be shut down or secured. We then analyze the leak to identify victims, the duration of the exposure, and any potential threats victims might face.