Privacy czars urge websites to block data scraping

Privacy and information commissioners from 12 jurisdictions including Canada, the U.K., China, and Australia have urged social media companies to do more to prevent threat actors from scraping personal data from their IT systems.

“Social media companies and the operators of websites that host publicly accessible
personal data have obligations under data protection and privacy laws to protect
personal information on their platforms from unlawful data scraping,” the group said in a joint letter issued Thursday.

Not only was the joint letter released to the public, it was also sent directly to Alphabet Inc. (operator of YouTube), ByteDance Ltd. (TikTok), Meta Platforms, Inc. (Instagram,
Facebook and Threads), Microsoft (LinkedIn), Sina Corp (Weibo), and X Corp. (X,
previously Twitter).

It’s not unknown for companies do some data scraping. One of the most well-known example is ClearviewAI, which lifted the images of millions of people to populate its commercial facial recognition database. Several privacy commissioners around the world, including Canada, say that’s illegal.

But threat actors eager for large volumes of names, email addresses and other personal information for impersonation, fraud and enabling the hacking of organizations do it too — if the opportunity is there — largely because it’s easier than hacking into organizations’ databases.

One of the most recent examples was revealed this week: In January, someone posted data of 2.6 million users of the DuoLingo language learning site for sale on a criminal forum. A company spokesperson told The Record that the data had been scraped, and wasn’t the result of a hack. A hacker claimed on X/Twitter that the data was scraped from an exposed application programming interface (API).

In February, an archive containing data purportedly scraped from 500 million LinkedIn profiles was put for sale on a popular hacker forum.  In January a group someone started giving away data on tens of millions of Twitter users allegedly scraped off the site.

In their joint letter, the privacy and information commissioners say data scraping generally involves the automated extraction of data from the web. They issued the call to action because they are seeing increasing incidents involving data scraping, particularly from social media and other websites that host publicly accessible data.

Any online business has data protection obligations with respect to third-party scraping from their sites, the commissioners say. “These obligations will generally apply to personal information whether that information is publicly accessible or not. Mass data scraping of personal information can constitute a reportable data breach in many jurisdictions.

“The commissioners urge organizations to implement multi-layered technical and
procedural controls to mitigate the risks of data scraping.” They said a combination of these controls should be used that is proportionate to the sensitivity of the information, and may include:

  • designating a team and/or specific roles within the organization to identify and implement controls to protect against, monitor for, and respond to scraping activities;
  • ‘rate limiting’ the number of visits per hour or day by one account to other account profiles, and limiting access if unusual activity is detected;
  • monitoring how quickly and aggressively a new account starts looking for other users. If
    abnormally high activity is detected, this could be indicative of unacceptable usage;
  • taking steps to detect scrapers by identifying patterns in ‘bot’ activity. For example, a group of suspicious IP addresses can be detected by monitoring from where a platform is being accessed by using the same credentials from multiple locations. This would be suspicious where these accesses are occurring within a short period of time;
  • taking steps to detect bots, such as by using CAPTCHAs, and blocking the IP address where data scraping activity is identified;
  • where data scraping is suspected and/or confirmed, taking appropriate legal action such as the sending of ‘cease and desist’ letters, requiring the deletion of scraped information,
  • obtaining confirmation of the deletion, and other legal action to enforce terms and conditions prohibiting data scraping;
  • in jurisdictions where the data scraping may constitute a data breach, notifying affected
    individuals and privacy regulators as required.

Individuals can protect themselves from data scraping by reading website privacy statements about how they share personal information, including the privacy policy. That will help guide people on what information they should share with a site when registering or paying for a product or service. Some websites, the privacy commissioners note, let users increase the control they have over how their personal information is shared online.

The letter asks social media companies show within one month how they comply with the expectations outlined in the joint statement.

Would you recommend this article?


Thanks for taking the time to let us know what you think of this article!
We'd love to hear your opinion about this or any other story you read in our publication.

Jim Love, Chief Content Officer, IT World Canada

Featured Download

Howard Solomon
Howard Solomon
Currently a freelance writer. Former editor of and Computing Canada. An IT journalist since 1997, Howard has written for several of ITWC's sister publications, including Before arriving at ITWC he served as a staff reporter at the Calgary Herald and the Brampton (Ont.) Daily Times.

Related Tech News

Get ITBusiness Delivered

Our experienced team of journalists brings you engaging content targeted to IT professionals and line-of-business executives delivered directly to your inbox.

Featured Tech Jobs