A file containing personal information concerning 1.5 billion Facebook accounts is reportedly for sale on the net. If this colossal archive really existed, it would undoubtedly be the largest Facebook dataset ever released to date.
While Facebook is experiencing an absolutely historic outage at the beginning of October 2021, another potentially equally worrying parallel news complicates the life of the first social network in the world. A publication from the Privacy Affairs site reports that a database of 1.5 billion Facebook accounts is for sale on a hacker forum.
Please note, it should be emphasized that these two events are absolutely not linked to each other.
Here is what emerges from this case.
What are we talking about exactly?
Potentially from ” the largest amount of Facebook data ever released According to PrivacyAffairs. The database in question is said to contain the names, emails, location records, phone numbers and gender identity information of more than half of Facebook accounts worldwide. Suffice to say that the file would be huge. All of this data would now be on sale.
It remains unclear how such precise information was able to come out and whether the elements in question are exhaustive for all the accounts affected. It is possible that some lines of the database are incomplete, containing only an email or a telephone number. It is also possible that part of the archive actually comes from other leaks that have affected the social network. In any case, at this scale, it represents a considerable amount of data.
The seller behind the offer claims that all of this information is fresh from this year. He also indicates that he works for a company that is said to be specialized in recovering data from Facebook. There are traces of this company (which Numerama has decided not to name) on the net, including a YouTube account in which videos detail the methods for extracting data from Facebook.
But unlike a classic data leak, obtained by accessing an internal database, the method used is different. In other words, the data would have been amassed by “scraping” Facebook pages.
What is scraping?
What is called “scraping” designates a computer process consisting in collecting data on a web page, thanks to a script, that is to say a program designed to perform a specific task. In this case, the program automates the aspiration of information which enters its radar, according to the instructions contained in its lines of code.
Compared to fraudulent access taking advantage of the negligence of an employee (thanks to a phishing operation for example) or the exploitation of a flaw that has not been corrected, scraping is a technique that is accessible . Indeed, it only takes what is visible on the net. Anyone with development skills can craft their own scraping tool.
If this extraction is within the reach of many, its legality is very questionable and can quickly flirt with the yellow line. Indeed, this method consists in retrieving what is publicly accessible, according to the rules defined by the script. However, what is publicly accessible does not mean that it is public data. It can be personal or even sensitive data. And in this case, the law applies. This, even though there may have been some negligence on the part of people when they shared their information on a particular site.
Usually, the conditions of use of the sites deal with scraping by prohibiting it – sometimes taking technical measures to limit the volume of requests sent. But new flaws are constantly being discovered.
It is a strange coincidence, which could encourage us to bring the two events closer together. But the circulation of this alleged archive containing personal data from 1.5 billion Facebook accounts has nothing to do with the exceptional blackout that brought down the entire network of the net giant. The elements that emerge from this technical malfunction suggest a configuration problem, which has nothing to do with scraping.
Is this leak credible?
Scraping or not, a leak of such magnitude necessarily raises the question of the credibility of the case, because we are talking about an archive that would contain one or more personal data of more than one in seven people living on Earth – well that there are also a lot of fake accounts on Facebook. Collecting so many elements from so many profiles is a tour de force, including with the support and resources of a company that would show itself to be unscrupulous.
Recovering so much data from so many Facebook accounts would be a real feat
On the forum where the announcement was posted, and in front of the atypical allegations of the author of the discussion thread, several members nevertheless display their perplexity. Some people doubt the actual size of the file. Another claims to have already tried to buy data from the seller (in another business), but never received anything. In addition, the possibility that this file is made up of several other Facebook databases that can be found on the web cannot be ruled out either.
It appears that the database went on sale on September 22. It would therefore be a dozen days since such a file would potentially circulate on the net. Without it being noticed. This doesn’t necessarily mean the case is a bad hoax, but you have to be careful.
Have there been other similar cases?
Scraping didn’t wait for Facebook to exist. Therefore, this is (unfortunately) not the first time that a database resulting from a mining operation has arisen, whether against Facebook or any other site. But the net giant, by virtue of its size, necessarily attracts this type of activity much more. It’s the other side of the coin when you aggregate so many people into one place.
There are plenty of examples. As of September 2019, 419 million phone numbers had been obtained using this method. In June 2021, 500 million numbers were found on the web. Always gleaned from Facebook. More recently, the professional social network LinkedIn has twice found itself in the same situation as Facebook. Once in April and once in June.
The simplicity of scraping makes it a very popular “attack” method, much more accessible than operations that would require entering the internal network of a social network like Facebook. It is therefore common to see ads flourish on shady sites, like this one. Even if, once again, the size of the database we are talking about today is considerable.
What to do to protect yourself?
By definition, all the information that you share in public on the web is likely to be recovered via scraping tools. To protect yourself, the best way is therefore never to share any sensitive information publicly on the web. Or, in any case, as little as possible. It may also be a good idea to make sure that you leave almost nothing of your Facebook profile in public.
Removing all your traces from the net being a somewhat radical solution and not always easily applicable, even in practice impossible to really implement, there are other methods to protect yourself against this kind of extraction. So, make sure, on all your social networks, that your phone number and your other personal information are not publicly visible. And share this kind of information sparingly on any platform.
Once in the wild, this data is used to build very compelling phishing campaigns, which can be difficult to foil. The more data a malicious actor has about you, the less difficult it will be to impersonate your bank or insurance. Or just spam yourself. So be careful when you receive a phone call or an email that seems a little fishy.
Article written with the collaboration of Julien Lausson.