How to deal with content theft

Has your original Web content been mysteriously appearing on other sites? Has something you’ve written recently appeared word-for-word, without attribution or back links, on a Web site other than your own?

Web content theft, whether done manually or by automated tools also known as scrapers, has been a rampant practice on the Internet for years.

We decided to take another look at content theft after one of our readers commented on our recent article about Google’s battle against duplicated content, asked us for advice on how to deal with content snatchers.

Related stories

Google’s content farm crackdown to affect Canadian rankings soon
How to bear with Google’s Panda update

Kathy Tremblay, who owns a portable shelter company in Randolph, N.H. that also does some business with Canadians, told us her site’s content had been stolen by a competitor. “I recall about a year ago, while looking at some of my competitors’ pages, I noticed that quite a few of my words and page content was copied – word-for-word – on their site. Talk about duplicating content!”

Tremblay said even though she wrote to the Web site owners and asked them to remove the content, she never heard back from them.

“So how does a Web site owner deal with that? I would not want to be penalized when someone else used my work,” she wrote, alluding to the Google’s updated search algorithm that gives lower search rankings to duplicated content.

Last Friday Tremblay contacted us again and said content snatching troubles here very much. “I feel very hurt and deceived. I recall feeling quite disbelieving and even angry when I discovered it.”

Tremblay also said she was concerned that customers my think she was the one copying the material: “My gut feeling is that when a customer comes to my site and reads the content I’ve written and then goes to my competitor and sees the same content, I believe he feels that we’re not a ‘real’ business.”

The bad news is there is no surefire way of putting a stop to content theft. Once content is viewable on the Internet, there is always a way to obtain it. But there are a number of strategies you can employ to make sure your content maintains an upper hand (search ranking-wise) over the copied content, according to Scott Wilson, search engine optimization (SEO) specialist and CEO of Burlington, Ont.-based RankHigher.ca.

Wilson, who’s specialization includes video production, optimization of Google Places and search marketing in Google AdWords, said “organizations bent on stealing content will always find a way to do so.”

The odds are heavily stacked against the original owner of the content because apart from competitors, there are tens of thousands of sites on the Internet that exist to steal content.

“The stealing is done to populate MFAs (made for ads) Web sites. These are sites that make money by serving up online ads,” he said.

The MFA site owners earn money from clicks the ads get. In order to attract viewers, the site owners fill the site with low quality content that ride on popular topics of content that is stolen from other sites either through manual copy-and-paste methods or automated scrapers, Wilson explained.

How to uncover content snatchers

There are offices that handle cases of content theft and legal action can be brought against content snatchers.

There are numerous cases of organizations going after content thieves or online plagiarists, but these are typically large corporations that are out to protect their multi-million dollar brand or intellectual property. Very often, Wilson said, small and medium sized businesses (SMBs) are prepared to undertake what could be a protracted legal battle.

“A small business operator has to determine if it is worth the time, effort and money to track down the offender and bring them to court,” Wilson said.

On the part of its site dealing with Digital Millennium Copyright Act, Google said when it receives notice of alleged copyright infringment the search engine’s actions could include: “removing or disabling access to material claimed to be the subject of infringing activity and/or terminating subscribers.”

If Google does remove or disable access in response to such a notice, Google makes “a good-faith attempt” to contact the owner or administrator of the affected site or content so that they may make a counter notification.

Google also warned parties that file a complaint with them “will be liable for damages (including costs and lawyers’ fees) if you materially misrepresent that a product or activity is infringing your copyrights.”

Google sited one such case in which a company that sent an infringement notification seeking removal of online materials that were protected by the fair use doctrine was ordered to pay such costs and lawyers’ fees. “The company agreed to pay over $100,000…If you are not sure whether material available online infringes your copyright, we suggest that you first contact a lawyer.”

Wilson of RankHigher suggested that businesses can tackle the problem by concentrating on strengthening their original content’s search optimization properties.

The first step, Wilson said, is to determine if your content is being stolen. You can do a search on Google and other search engines using the keywords that you think best describes your text or image content. However a faster way is to use online tools.

Wilson recommends using Copyscrape, a free tool which helps users identify who is publishing and republishing Web site content.

“We frequently hire freelance writers and we use Copyscrape to make sure that they are not using duplicate or plagiarized content. Copyscrape is the best site I know for this purpose. It is used by many Webmasters, businesses and social media experts,” he said.

Make sure Google knows you’re the original author

Google’s search algorithms are geared towards rewarding original content authors and creators of high value content by giving them higher search rankings.

Taking this into account, Wilson said, a business owner can protect their site’s content by making sure the search engine identifies your site as the original source of the content.

Site owners should three things in mind:

  • Make sure Google can find your site
  • Make sure Google trusts your site
  • Make sure your content is focused

Google Web crawlers are not very good with identifying Flash content, according to Wilson. “Unfortunately many Web designers use Flash when they want to create special features and effects for a site.”

“Don’t get me wrong. I like Flash and I think it’s a great tool. But if you want your site to be easily found by Google, you can also use open sourced tools such as WordPress, Joomla! or Drupal, which Google can easily read,” he said.

When there are several sites that appear to have the same content, Google typically gives the site where the content appeared earlier a higher ranking because this is probably the original source, said Wilson.

One way sites can establish content seniority, he said, is to avoid changing the URL of a site’s content. “The URL has an association indicating when the content was posted online. Any change to that URL will put that date back to zero.

“If URLs need to be altered, a 301 redirect should be added to it to ensure that searchers are redirected to the original URL. This way the content isn’t viewed as new,” said Wilson.

Some business owners might also want to rethink the situation, according to Wilson. “Having your content appear on another site might not be a totally bad thing.”

He said this could turn out to be a positive situation if the content is properly attributed to you or your site. This could mean additional exposure that you do not need to pay for.

Wilson said make sure the copied content mentions your name or your business name as the source. Also have the site featuring your content to link back to your site, this way you could get some of its readers to come back to your site.

Links could be established directly by having the other site post your name and site address with a link to it or indirectly via hyperlinks on the text or image being used.

Tremblay agrees with Wilson. She said she is not adverse to another person reprinting or reposting and article she wrote. But she also wanted to site using her article to “acknowledge the article’s origin” by attributing the article to her. “I am a writer who works very hard at what I do, and I am 100 per cent against plagiarism – which is exactly what happened.”

Reporting content snatchers

If you are really bent on reporting content theft, Lori’s Web Design says that most web hosting firms have strict rules regarding copyrighted content. You can try to identify the Wed host of the offending site and inform them that their client is using stolen content.

Another option is to collect evidence that your content has been stolen and bring matters copyright infringement to Google or file a DMCA (Digital Millennium Copyright Act) report with the search engine.

Click here to access a tool from Google that will guide you through the process of reporting content for removal.

Tremblay said she was so frustrated with the lack of response from the site that stole her content that she decided to drop the whole matter. But last Friday she said this attitude changed after she read the ITBusiness.ca article on content farms and Google’s moves against them.

She said she would be more vigilant in protecting her site content: “I am probably going to look into this again to see to what extent the website is still using my words (if they still are), and to see if anyone else out there is using it without my consent.”

Nestor Arellano is a Senior Writer at ITBusiness.ca. Follow him on Twitter, read his blog, and join the IT Business Facebook Page.

Share on LinkedIn Share with Google+