What do you need to know about selling 700 million LinkedIn profiles?

Yes, a database of 700 million LinkedIn profiles is for sale on a specialized forum. But that doesn’t mean that LinkedIn has suffered a breach or that 92% of its users are affected.

In April 2021, the sale of a database containing the data of 500 million LinkedIn users was in the headlines. This information did not come from a flaw in the social network, but from a simple massive “scraping”, as Cyberguerre detailed then. In other words, the seller had gathered data accessible by a simple visit to the profile of the persons concerned, and had not obtained any private data.

Two months after this story, a new media streak emerges, this time with the sale of data from 700 million LinkedIn users. It was published on the same forum … by the same user, a certain “TomLiner” who created his account in February 2021. But it is not really a second database, rather an improved version of the first.

Everything starts from an advertisement posted on a well-known forum. // Source: Numerama screenshot

As is customary in the industry, the seller attached to his offer a free sample of 1 million profiles, which Cyberguerre obtained. This extract acts as a guarantee of the existence of the goods, in an unregulated market where scams are numerous. On the other hand, to afford the entire database, you have to contact the seller directly through a messaging app, and pay a substantial amount of money. The conclusions of this article are therefore drawn on the basis of the sample, which is supposed to be representative.

Why are we talking about this sale?

LinkedIn’s data sale announcement was posted on June 22, drowned out among dozens of similar announcements posted daily on the forum. Five days later, two companies, PrivacySharks and RestorePrivacy wrote about it on their respective sites. If the first title its paper on the sale of data, the second speaks of a ” data leak “. Their information was seldom taken up in the American press, but the French media seized on it massively.

It must be said that in the sample available, we find a lot of personal data:

  • Email addresses;
  • Names ;
  • Phone numbers;
  • Physical addresses (place of work or home);
  • The link to LinkedIn profiles;
  • GPS coordinates linked to several events, such as the creation of the account;
  • The gender of the users;
  • Any professional and academic experiences;
  • An estimate of the salary, calculated by LinkedIn under several conditions;
  • And others…

Please note, each profile does not contain all of this information, far from it. For good reason: most of this data has been entered by users, and a good deal of this information is not required to create an account. That’s not all: in principle, users know that they are posting this information publicly. As a result, the database systematically contains the name attached to the profile, but much more rarely its telephone number.

Faced with the extent of the subject, LinkedIn published a first press release: ” We want to make it clear that this is not a data breach, and that no private LinkedIn member data has been exposed. Our initial investigation concludes that this data has been “scrapped” on LinkedIn and various other sites, and that this includes the same data we discussed in April 2021. “.

Can we speak of a “flaw” in LinkedIn?

No, at least not on the basis of the sample. We are talking about ” fault “ to designate a vulnerability in the system, which would have been exploited to access protected information. Here, all the information collected seems a priori public – either entered by the company or by the user.

RestorePrivacy says cybercriminals have abused LinkedIn’s APIs – APIs are web interfaces that make it easier to retrieve information from the site for use elsewhere. In other words, they would have automated the collection using the social network’s own tools. LinkedIn believes that the thugs obtained some of the data from another source.

In principle, sites put limits on the number of requests that a single person can send to their APIs in order to avoid this kind of abuse.

Can we speak of a data “leak”?

Not really. A leak would imply that a closed container (a database for example) has been opened and emptied of its contents. Here, the data was already outside of any container, on users’ public profiles. The seller would have been content to suck them up thanks to the ” scraping “. Troy Hunt, head of HaveIBeenPwned, a leading data breach site, stresses the distinction.

The fact remains that scraping is contrary to the conditions of use of the site and that more generally it is a practice bordering on legality. In addition, the data, even if it is displayed publicly, should in principle not be combined in the same file. For good reason: thanks to aggregation, a thief will be able to launch mass operations, such as sending phishing.

Are data sold on the “dark web”?

No. Commonly, we speak of ” dark web »To designate the part of the Internet that is not accessible from a traditional browser (Firefox, Edge, Chrome, Safari, Opera…). This includes for example .onion sites, accessible only from the Tor network. These web pages manage to escape certain regulations of the traditional internet, making them an ideal refuge for illegal activities, but also for many other things, such as organizing activists to escape censorship.

In our case, the sales ad was posted on a popular data-selling forum that is known to all in the business. No need for special skills or knowledge to access it, just enter the URL or enter a few keywords on a search engine

In short, if the forum can be qualified as a “black market” since it allows the organization of illicit activities, it is not on the “dark web”.

Are “92% of LinkedIn users” really affected?

It’s more complicated than that. In its article, RestorePrivacy makes a simplistic calculation that has been widely repeated: since the database contains 700 million profiles, and LinkedIn publicly displays that it has 756 million users, the database cumulates the data of 92% of users. of the social network.

Except that in detail, some data is out of date since it was collected before 2020. Likewise, it may belong to inactive accounts. And that’s not all: if the database accumulates data from several sources, it could have duplicates. In conclusion, the percentage of active LinkedIn users who would be affected should be lower than this estimate.

Is it as bad as it sounds?

In cybersecurity, the amount of data is always impressive, but it is above all the quality of the data that matters.

However, this database is devoid of very high value information such as identifiers (username and password) and banking information. This data is sought after because it can be immediately exploited to steal an account or steal money.

On the other hand, the file contains other exploitable personal data although less interesting, such as emails or telephone numbers. This information is all the more useful as it is attached to others such as the name, the employer or the city of the victim’s domicile.

A thief could therefore use basic information to send more or less personalized phishing emails. The purpose of these messages is to steal high value information through fake forms or malware. For example, a cybercriminal could send phishing to all employees who work in the banking industry. The manipulation would aim to recover the identifiers of company accounts in order to infiltrate the network and deploy ransomware.

These catastrophic scenarios are to be qualified by the originality of the base: would the cybercriminal world really find information there that it does not already have? Couldn’t the attacks already be launched with a simple visit to users’ LinkedIn pages?

Related Articles