Close
Close

Fighting Scrapers With Your Left Jab

Left-JabThis guest post was submitted by Patrick who blogs at Piggy Bank Pie Writing Services.

I started 2008 with a post that went viral on StumbleUpon and BloggingZoom. Even Skellie and Caroline Middlebrook took the story back to their respective blogs. For those interested, the post in question was How I Received 850 Visitors Without Using Social Media Sites.

Image by Dave Hogg

But high visibility also comes with a price. Once a few bad guys hear about your blog, they hit you in the back like poor losers, running away with your own content to monetize their site. The opponents are called scrapers. Let’s challenge them on a five round match to see who deserves the title. Gentlemen, let’s have a clean fight, play by the rules, no punch below the belt and hitting behind the head.

Definition Of A Scraper

Using hacking tools, scrapers subscribe to your site with the intention of stealing your content right off your syndicated feed. Once you publish a new article, the program fetches your entire post from the RSS feed and publishes a carbon copy on the scraper’s site. If you haven’t taken some precautions, search engine crawlers can index the scraper’s content before yours, and even punish you for duplicate content.

Why They Do This?

Ever heard of Made For AdSense? Scrapers need content to feed their contextual ads such as Google AdSense. Since they are unable to write their own -hey, no hit below the belt- they steel yours and publish it on their blog where most of the time AdSense is used heavily. Scrapers often try to target specific keywords and their goal is to steal articles that help them rank high in Search Engine Results Page. With better ranking comes better traffic and obviously, better click-through rate on their ads.

What Can You Do?

I would LOVE to write the ultimate solution for preventing scrapers from playing against the rule. However, it is not that easy, and the process might be time consuming. But still, if you are minded to jump into the ring, here’s a five round fight strategy that could potentially bring your opponent down.

Let’s get ready to rumble.

1. License Your Content

The very first step I would recommend is to use a license service such as Creative Common. By licensing your content, at least you inform visitors that articles published on your site are subject to copyright laws. This allows you to specify under which conditions your work can be distributed. You can visit this page to choose the proper license for your work.

2. Add a Link To Your Orignal Post in Your RSS Feed

Joost de Valk, Shoemoney’s well-known webdeveloper, just wrote a WordPress plugin called RSS Footer that automates the process of adding a link in your RSS feed that points to the original source of your post. Here’s what Matt Cutts, a Google engineer, said recently in a interview about linking to the original source of an article:

“…if the syndicated article has a link to the original source of that article, then it is pretty much guaranteed the original home of that article will always have the higher PageRank, compared to all the syndicated copies. And that just makes it that much easier for us to do duplicate content detection and say: “You know what, this is the original article; this is the good one, so go with that.”

Installing and configuring RSS Footer is a piece of cake, I highly recommend you give it a shot.

3. Report Scrapers To AdSense

Visiting your scraper’s site could help you gain a few points in the fight. Have you ever clicked the Ads by Google link on AdSense ads? This opens up a page where you can subscribe to both AdWords and AdSense. However, if you look at the bottom of the page you will notice a link that says Send Google your thoughts on the site or the ads you just saw. The beauty of this link is that it knows where you are coming from -your scraper- and it fires up a questionnaire regarding the relevance of your scraper’s ads. Now is the time to throw a left jab:

  • Click Report a Violation?
  • This brings up a question asking if the issue is with the website or ads, select website;
  • You will now be asked which policy is violated, select The site is hosting/distributing my copyrighted content;
  • Finally, use the text box under Add additional information here to explain your story to the referee.

4. Report Scrapers To Google

Now is the time to send your opponent to the floor for a first count of 8. Go to google.com, type your scraper’s domain in the search field and hit the Google Search button. If it finds your scraper’s site, this means his website is indexed by Google. Now go to Google’s page to Report a Spam Result and proceed as followed:

  • Exact query that shows a problem: Type what you entered in Google’s search box to find your scraper’s site
  • Resulting Google page that shows problem: Enter the complete URL of the Google page returning the search result
  • The specific web page or site that is misbehaving: Type you scraper’s domain name
  • Type(s) of problem (check all that apply): Select Duplicate site or pages
  • Enter you story in the Additional details text box and click the Submit button.

5. Report Scrapers To Their Web Hosting Service

This is the ultimate opportunity to hit with a multi-punch combination. Go to whoishostingthis.com and type your scraper’s domain in the search box. This brings you a link to your scraper’s web hosting company. Once you are on the home page of the provider, look for a contact page. Use either online chat, email or a contact form to explain the situation. If you are required to provide a full and complete DMCA, I suggest you visit this page to get the DMCA form. If you go through all of this and your scraper gets kicked out by his web hosting service, consider you’ve won the fight by unanimous decision.

Summary

While this may not be an instant solution for preventing scrapers to steal content, it can surely make their life more difficult. If everything goes well and the scraper gets banned from AdSense, Google and his service provider, well, that’s a technical knockout. Now let’s just hope he’ll be out of the ring once and for all.

Has your content ever been stolen by scrapers? Have you tried some of the above strategies? Do you have other ideas to share? Please join the conversation over to comments.

About Darren Rowse

Darren Rowse is the founder and editor of ProBlogger Blog Tips and Digital Photography School. Learn more about him here and connect with him on Twitter, Facebook, Google+ and LinkedIn.

Problogger.net runs on the Genesis Framework

Genesis Framework

The Genesis Framework empowers you to quickly and easily build incredible websites with WordPress. Genesis provides the secure and search-engine-optimized foundation that takes WordPress to places you never thought it could go.

Check out the incredible features and the selection of designs. It's that simple - start using Genesis now!

Comments

  1. Neecy says:

    I thought written words,,,ie books etc were aoutmatically copyrighted?,,or is that only art?

    also,,how can you tell if you have been scraped?

    Thank you

  2. Neecy says:

    DISEASED KANGAROO…THAT is hilarious!

  3. I just started a blog. I have a total of 13 posts on it and have already been scraped by someone or something. This is very annoying and I will follow the items listed above and hopefully the issue will be resolved.

    I wonder if putting a link to the blog homepage in every post will help?

    Thanks for the tips.

    John

  4. Fiar says:

    That’s actually a good strategy to take advantage of the scrapers. Most scrapers will only post snippets, so put the link into the first sentence or two. Actually, chock your posts full of internal links, but always fit one in the first or second sentence, preferably with a keyword you are targeting as the anchor.

    Check out this post on duplicate content and especially the comments for some help. There are some really good tips on using scrapers to your advantage there.

  5. Neecy says:

    Is there an RSS footer for Blogger?

  6. We had exactly this issue with content from Fashionising (http://www.fashionising.com). In the end I custom built our RSS feed, which now publishes only
    * The first half of the article
    * Two different links back to the original article, with a clearly stated “Read the rest here” type by line

  7. Alison says:

    Great article! I just discovered someone had taken one of my blog posts and I’ve implemented 4. and 5. Now I’m off to do 2. Whew, I had no clue how to handle this. Thanks for saving my time and energy with this great list!!!!

  8. p1nk g33k says:

    I only get upset when they try to pass the content off as their own or they don’t link back to my site.

    The first thing that I do is find out if they advertise with AdSense, and then I report them.

    The Blogspot blogs are the worse. And, that’s funny, because they’re owned by Google. You would think that Google would be able to get rid of those blogs the quickest.

  9. Jessica says:

    Thank you SO much for this. Thanks to you, I just got a site deindexed from Google within 2 or 3 days for stealing my content. (They plagiarized my entire post, did not give me credit, and to put the icing on the cake, their site was ranking for the keywords in my article while my own site wasn’t!) I am brand new to internet marketing and had no clue how to handle scrapers until I read this post.

  10. netbook says:

    Wow, great read. I just got scraped and found them ranked btter than me. That burns big time. I will take the steps you mention here. Very handy. I have this bookmarked!!