Written on October 20th, 2009 at 01:10 am by Darren Rowse

Stop Scrapers and Spammers Fast

Miscellaneous Blog Tips 98 comments

One of the challenges that bloggers face is what to do when others want to use your blog for their own gain by either taking your content or spamming your comments section. The more I talk to bloggers about how they deal with these issues the more I realize how many different approaches there are to the problems. Today Seth Waite from Blogussion shares his approach. I’d love to hear your approach (whether it be different or the same in comments below).

Every blogger quickly learns the reality of hard work in blogging. After the “make money fast” hype has wore off and the reality that blogging is a great way to earn an income if you work for it has set in, you are left with a choice?

The choice is whether to stay in blogging or not. Many bloggers decide to stay but are again left with another extremely important decision. Should I put the effort into become a great blogger or just try to still do things the easy way and hope things will be different for me?

Those choosing to work hard begin the process of learning and eventually find success by learning, networking and earning their way to better blogging. Bloggers who are unwilling to face reality either quite or eventually become spammers, scrapers, or beggars.

I am not going to address the problem of bloggers who beg for help without working for it, but I do want to talk about spammers and scrapers. Most importantly, I want every hard working blogger to know how to stop selfish bloggers trying to use your work disrespectfully to help them.

Stopping Spam

The easiest way to stop spammers who are trying to get you to link to their blog/site is by controlling your comments and trackbacks. Although essential to building a great blog community, comments must be moderated to ensure your actual readers feel comfortable with the discussions on your blog.

Captcha

Commenting at first was easily controlled by forcing commentators to put their email address into the comment form. Spammers quickly got around this and now a very easy way to stop spammers is by adding a captcha feature to your blog comments.

Captcha is already used by Blogger and easily adds to Wordpress and other blogging platforms with plugins. The way it works is that you put in a series of numbers or letters from a visual image in order to post your comment. Other systems require you to add the numbers or fill in the form based on another easy question. Using captcha is a quick and easy way to minimize your blog’s spam, but it may also be annoying to regular readers.

Plug-ins

For many blog platforms, like Wordpress, a simple plug-in will solve many of the spam problems. The most common spam blocker is Akismet, which is now available for over 20 other blogging platforms besides Wordpress. Using this plug-in on your blog is simple and requires you to only check to make sure occasional comments are not being counted as spam. In addition to the normal comment protection it provides, it goes above and beyond captchas by protecting your blog against unwanted trackbacks.

Stopping Scrappers

Scrapers are bloggers who steal content you produced and put the entire work on their own blogs and websites. The practice sadly is common and creates reproductions of your content around the web. Luckily most search engines are good at recognizing the original content, but scrapping is illegal and damaging to the blogger and blogging.

  1. Identify: The first step to stopping scrappers is by identifying your content and checking for copies. An easy way to do this is by using the sites CopyGator or Copyscape to check for the originality of your content and any potential duplicates.
  2. Ask: Once you have found scrappers who have copied your material [note: the content duplication should be significant and their reasons should be to represent your content as their own, not to promote yours] email the owner or comment on the blog/site where the duplicate is found. In most cases the scraper will take it down and apologize for misrepresenting the work. Always try this first so that the blogosphere can stay friendly and young bloggers who might be making an innocent mistake will learn without being accosted.
  3. Block: The next step if they are unresponsive or belligerent to your requests is to use .htaccess to block the scrappers from your blog. This can be a little bit tricky for anyone who has never done this before, but here is a great link to learn how to stop scrapers [item #9]. Basically you are blocking the access of the scrappers from receiving your blog and rss feed.
  4. Take Action: At this point you have been nice, notified them of their misdeed, blocked their access and still the content is ripped off and on their site. The next way to get your content off of their site is by contacting the site’s ISP or hosting. The easiest way to find that out is by using Who.is and just inputting the site’s web address into their search bar. The hosting information will then show up with the rest of the site’s information. Once you have the host information contact them with a formal letter or email specifically claiming what and where the content originated and where it has been reproduced. The host will then quickly take down the content and offer the site owner a chance to explain themselves. Warning, this is serious for everyone involved so do not use this lightly. If this does not work there is yet one more option. This is legal action. Filed suits can be taken up depending on the scrapper’s home country and legal system.

Stopping scrapers and spammers will not only protect your work but also encourage the internet to be a better place. Every time a spammer is thwarted, other bloggers win too. So be an internet community builder by taking the proper steps to stop content thieves.

Seth Waite is Editor at Blogussion.com and enjoys helping every blogger reach their blogging goals. to contact Seth directly, just find him on Twitter @Seth1492

What’s Your Approach

From Darren: as mentioned in the introduction to this post – there are many stances that bloggers take on these issues, particularly when it comes to scrapers. Many take a similar line to Seth while others are more lenient and take the approach that as long as someone’s reading their content somewhere that it doesn’t worry them. What do you do? What tools do you use?

31DBBB.png

98 Responses to “Stop Scrapers and Spammers Fast” - Add Yours

  • Thanks for linking to WPShout! You say it’s quite a hard thing to do, but follow the instructions and you’ll be fine! Any problems feel free to leave a comment on WPShout :)

  • Thanks so much for this!

    This is an important topic. I have been plagued by scrapers from time and at first I had no idea what to do.

    The first time that it happened I remember being really upset. I wish that I’d had this post, with its practical steps, handy back then.

  • So what about the spammers that get around Captcha? They leave things like “Internet Marketing” as their name and link to their crappy site, but actually leave a semi-relevant comment?

    You know, it’s barely on subject, but just enough not to want to flag or delete it.

  • Yeah, it’s a shame we have to resort to things like this, but it’s true, you have to have “blog protection” in place or you will find your content without your name on it. Thanks for the reminder.

  • I have had content stolen from me several times but have never had to go past step one. Thanks for the additional information in case I ever have to go to the next step.

  • I really hat captcha as it is a comment obstruction . I mostly rely on akismet and two more plugins and that helps me the most. Else I find it hard to check my spam messages to pull out genuine messages….
    More over blocking IP via .htaccess is something which I will not suggest as you might block and entire network ….

  • We’ve found the WP plugin “Cookies for Comments” a nice compliment to Akismet. Stops the bots from posting in the first place.

  • Great info!

    This is a problem that I have suffered with in the past!

    Not had a problem recently though!

  • At the moment i am relying on two tools. Akismet and me. Akismet is great tool eliminate spams, however it can’t do all the job for you. The rest relies on me. Going through each comment and check if it relates to topiv or not. I manually check the website (suitable or not for linkback) that is provided as link in the comment.

  • I’ve had several run-ins with scapers and splogs and one in particular with whom I’ve yet to reach a resolution. That person is basically passing off my posts as his/her own.

    Some additional tools i use are:

    Google Alerts (http://www.google.com/alerts)

    FairShare (http://www.fairshare.cc/fairshare/) alerts you when a certain percentage of your content is being reused

    tracer (http://tracer.tynt.com/) which embeds a message and link to Creative Commons license when someone pastes text copied from your site

  • As to Darren’s footnote, scrapers hurt bloggers tremendously, often ranking about original content in search engines for inexplicable reasons. So I do what I can to fight being scraped.

    The problem is that

    1) most scrapers are automated rather than some kid copying your posts manually;

    2) most therefore do not post contact information;

    3) most are hosted on overseas servers, making them immune from US DMCA takedown notices.

  • Good info to know.

  • I’ve always been pretty lenient with this stuff. My primary strategy has always been cross linking within the content of my posts, which is now made even easier with any of a handful of plugins that will automate this for your WordPress blog.

    As long as the content links back to me, I really don’t care what someone does with it, and most of even the hardcore scrape and spam guys are too lazy to take the links out.

    I am looking at the solutions mentioned in this article to investigate a more proactive stance toward identifying and targeting the really bad stuff that doesn’t even give the credit.

    Ultimately though I suspect that the time spent chasing these guys down and getting your content back from them might be more profitably spent creating more content.

  • Hey Seth.

    Good to see you here. For those who don’t know, Seth is also one of the many knowledgable people who you may run into in the problogger.com forums.

    I saw the first scraper of my material a few days ago, so I got a couple of these steps down, but it is good to know the whole order of what to do. I will try the .htaccess method as there is no contact form on their site.

    Thanks for the information.

  • I have had a couple of cases where a homepage on someone’s “blog” used my headline and excerpt, but when I clicked the “read more” it linked over to my blog.

    I honestly wasn’t really sure how to handle this. They didn’t steal all my work, but they did steal the headline and copy word for word. I just blocked the trackback and moved on with life.

    Could this hurt me with the “duplicate content penalty?”

    Regarding using Captcha, I think it is a horrible inconvience to your readers. It seems like half the time, even if you enter the write code, you “fail” which leads to losing your comment. I have stopped commenting and/or reading various blogs because of this.

    I have seen others that are simple like “what is the sum of 2 and 2″ which is much more reasonable.

    I also think Disqus is an great method to curb spammers.

    When it is all said and done, I think monitoring your comments is the best way to stop spam (mixed with Akismet). You should be proactive about reading your comments anyway, so having spam up for a couple of hours or having to delete 3-5 comments a day isn’t really a huge deal.

  • @Thomas – Tracer sounds like a really awesome deal. Do you know how reliable it is?

  • I get a stack of scrapers on totalapps.net they generally don’t take images but just copy and paste the text into their own blog. The odd person takes an image.

    Do they really gain from doing this ? I just assumed it was a fact of live and never bothered chasing it up, perhaps I should ?

  • Michael Gray had a related post, but with a positive spin towards looking at scraper sites as link building opportunities. He suggested to use the RSS Footer Wordpress plugin by Joost de Valk. An interesting take,
    http://www.wolf-howl.com/seo/use-scrapers-to-build-links/

  • I recently installed captcha on my blog to try and keep out the auto-joiners who would then progress on to either comment spam or post their email spam coding into my home page and link to it. Since I installed captcha – no more problems. Great post tho, will take a look at the scrapers section.

  • I use a plugin called AntiLeech. It doesn’t work with all scrapers but it does work for most of them.

    Basically, you tell the plugin what IP is stealing and reproducing your content. Next time this IP comes to steal more content, the plugin will feed it with a text you have previously defined.

    For example, my text to display on scrapers would be something like: “this article was stolen from http://www.iphonedownloadblog.com/ Stop reading this site. It steals content from other blogs”.

    Sometimes, it takes months for the scraper to realize it and during this time, you have fed his site with your links all over the place.

  • I use Akismet for spam, and it really does a good job. Rarely does anythng sneak through.

  • Scraping is a big problem with blogs especially technology blogs which are quite large in number.

    I am facing this scraping issues a lot nowadays…
    Also sometimes there are people who copy the entire post from my blog and give a small credit “Source” with a follow link ….

    Well thats good enough…. But I have seen that many of these scraped posts rank higher than my original post… this is what pains me as it hurts my traffic…..

    Sadly becoz of Google gives more importance to pages with backlinks and stuff…. even if its copied

    So what I do is to embed as many Hidden links back to my site as possible… For example a 1px*1px image linking to my hompage and some of the fullstops [.] linking back to the original posts.

    This technique sound crazy… but I found it does fetches you some backlinks ….

    :)

  • I think the level of scraping is an important element to evaluate. If it is one article and they are a small blog you might want to shoot a quick email or just forget about it all together.

    If you find your content is being completely scraped then more serious action should be taken because valuable readers are being taken from your efforts.

  • My tattoo blog content was regularly getting stolen. I installed the WordPress plugins AntiLeech and Simple Trackback Validation. Those plus Spam Karma 2 seem to have put a stop to spammers and scrappers.

  • I like your story, but contacting the ISP doesn’t always work as described. I had someone steal my original work (a story on electronic signatures) and tried to work with GoDaddy to resolve. Their legal dept had 6 requirements to submit a claim and each time I filed they said a different requirement was either not met or not written to their unpublished standards. I would re-write the section to their new needs and they would come back that a different section was now in error.

    It was just a game they played to make it impossible to file legitimate claims. Eventually I gave up, frustrated and committed to never using their services. The spam/scrapper site is still running my article without crediting the source.

  • Akismet is really amazing. As my blog gets bigger, the more spam comments I get. Akismet saves me several minutes every day.

    At least there’s a bright side to seeing spam: if you don’t get spam, you’re a nobody. Keep plugging away until spammers find you, then you know you’re well on your way! :)

  • I use Aksimet, and I have an account at Tynt. Tynt generates a link back to your site when someone copy/pastes your content.

  • For me, we must balanced up with spam controlling actions and reader-friendly element while considering how to control the spam from invading our blogs’ comment section.

    I do believe that captcha is not the best solution.

    It is because it will increase the amount taken by our blog readers to leave comment especially if the captcha is really hard to be read. This can be really discouraging.

    As the one who are asking others to do a favour by leaving comment in our blog, I think we should make it as easy and swift as it is possible.

    Along with Akismet, we as the bloggers should also spend our own time (not plugins or other software) to control the spam.

    Our readers have spent their time to leave their comments on our blog. Don’t you think it is fair for us to reward them by make their experience more ‘reader-friendly’? Why we can’t same at least the same amount of time to monitor our comment section?

    For me, there is always two sides of a coin that should be accounted for and think of.

    If not, for me, we are just a selfish blogger.

    What do you think, Seth?

  • Darren,

    It is much worse that you think. I’m in Asia and let me tell you an ugly secret. Your writings have been plagiarized, copied and reproduced in foreign languages without your prior knowledge and permission. Credits were not given to you at all!

    You should visit India, Indonesia and China more often.

  • I don’t understand spammers..that’s such a short term thinking, really useless what they do.
    Anyway i try to protect my work and blogs with Akismet and Captcha, as you already mentioned.
    I always block spammers on Twitter, without a question…i noticed we can finally retweet your valuable posts..
    Thanks.

  • It hasn’t happened to me…yet….but if it were I would visit the site and asked them to remove it of block quote it and link back to my blog.

    Why do folks have to be so gosh darn lazy? :)

  • Great post Seth. I agree with above commenter, best use is Akismet with occasional review.

  • A good way to avoid scrapping is to serve only partial post feeds. If the content is good enough the RSS feed readers will visit the site and read the article.

  • I agree that captcha’s are annoying… logging in to a site in order to comment is even more annoying… but it depends on the blogger I guess.

    I prefer a close monitoring of the comments because then I can read them and respond if necessary. Akismet is the best way to deal with spam.

  • We’ve had our site scraped and my nice emails were ignored. The content was finally removed when I sent cease and desist letters, citing the US Digital Copyright Millennium Act (DCMA) to the web host and the ad networks used on the persons site.

    Once you find who is hosting the site using whois.com, a decent web host will have explicit instructions on how to file a copyright complaint.)

    I figured if they’re making money on my content, they wouldn’t do anything until their host or revenue was shutdown. I’ve heard that Google will ban someone from AdSense if they steal content (violating the terms of service).

    Akimset works well for us and you do have to scan it every occasionally.

  • Has any of you used Mollom (vs Akismet)? If so, I’d love to get your feedback on that. (Disclaimer: I’m a co-founder of Mollom.)

  • Luckily I have never had my content stolen and hopefully I wont but if I do I will know wat to do!

  • Nice information will surely help on my various websites with comment boxes and such :)

    Thanks alot i shall be reading alot of your blog posts from now on ^.^

    Thanks, David Macaulay
    http://threerelics.com/

  • Bad behavior (http://www.bad-behavior.ioerror.us/) has been key in helping eliminate spam. Get a Project honeypot API key (http://www.projecthoneypot.org/), and it also works with bad behavior to prevent spam. This way you don’t need to use one of those nasty captcha scripts.

  • I’m not too bothered by auto scrapers that leave your brand and links intact, imo these help your site. One such scraper drives a good 2-300 hits a day so he must be doing something right!

    I think people should ask whether the scraper is harming your site or if this is about a “principle”. I think the former is worth fighting for, but if its the latter – you may well have better things to do with your time.

    I’d be a little bothered if someone built a site around my content, but I suspect it would be a waste of time as the big G will know where it saw the content first.

  • Reviewing some of the spam comments we get is always a treat. I’m tempted at times to approve some of the more creative ones.

  • I use akismet to stop spammer because i do not fell comfortable with captcha. I think many readers feel that captcha is annoying.

    And for scrappers, it is more difficult to deal. There are really many scrappers out there which use auto blog. I think that block their access is the best way but it need complicated way to block them trough htaccess. We may can use block ip plugin for it.

  • I had to deal with this problem unfortunately quite a few times and while being nice works…. sometimes…. I found that those who steal your content in order to profit understand only one thing – hit them where it hurts, source of their income.

    Filing DMCA complain with Google or Yahoo, 2 commonly used Ad networks on those blogs does wonders.

    Alex

  • An interesting and useful post, and thanks to the folks who have brought up Tynt, thanks!

    I wanted to point out that although the people who steal your stuff are the really annoying ones, a lot of your content is leaving your site from your fans. We don’t promote ourselves as an anti-plagiarism tool (although if we can help that way then I am happy) but rather as a way to try to encourage anyone who is distributing content from your site to properly link back to the source content.

    A couple of interesting notes. Of the over 5 billion page views we are monitoring every month, we are tracking over 100 million unique copy events. We find that 2-6% of all page views result in a copy. If we extrapolate Compete’s traffic information for the ProBlogger.net domain for example, we would expect to see ProBlogger readers copying content over 40,000 times every month! Obviously most of these aren’t scrapers, but ProBlogger fans who are liking what they read and spreading the word via email, Facebook, and other means.

  • I have had to send emails quite a few times to bloggers who have re-posted my articles, word for word, without permission or even a reference to me or my site. It seems to be common that new bloggers cut and paste without thinking about how it affects the original writers. Each time I’ve contacted a blogger, I not only ask them to remove the content, but I also include concrete examples of the appropriate ways to link and refer, so I feel like I’m helping and not just demanding.

  • Thanks for copygator! Will try. But who will steal my content anyway :)

  • I dunno, when I see spam on my blog I cry and weep and wail and gnash my teeth. I don’t think it’s any less effective that what you’re proposing.

  • Some good pointers. Captcha’s are a good one to use. They will at least block all the bots.

  • Captcha has been problematic for me, so I just monitor my comments and moderate the ones older than 14 days. But I don’t get that much spam so I must not be as popular .

    I have had content from my entire site scraped. The scraper left no contact info and ignored all polite requests. Fortunately, they set up on Blogger so a DMCA report to Google was quick and easy. Google took them down in a day.

    Regarding blocking, the .htaccess trick seems like a cool technical thing to do. One other suggestion is, if they are still linking to images on your host, to change the image but keep the same filename. The new image can have a message like “this site is stealing content from so-and-so.”

    Very useful post!

  • I agree with Thomas on fairshare – it’s free and all you need to do is submit your feed and to get results. I love fairshare because it highlights the passages that were copied.

  • Hey Darren, you need an editor for these things. The title is right, but “scrapper” is used throughout the article instead of “scraper.”

    The stopping spam tips are nothing new at all. He doesn’t even cover what you’re doing now – closing comments after a given period of time. There are so many techniques to use to combat spam that there have got to be more plugins for WP to implement them. (e.g., using CSS to hide honeypot fields, profiling a comment and creating a spam score based on other criteria such as mouse movement, keypress, referrer, etc. – each of which should not be used as a criteria alone to reject a comment, but together can be used to at least flag it for moderation.

  • Akismet is good to stop spamming but sometimes it goes down to stop spammers.
    I am facing the prob.
    But the copygator can do many things for us.

  • You know for me blogging is more about having fun and meeting terrific people than it is about spamming, trolling and scrapping. It’s sad that such an issue as to even be an issue. Get it done the right way or don’t do it at all.

  • Can spammers and auto-scrapping bots get past Captcha – I would not think so. Most people are now aware of Captcha and although it may be a bit of a pain, they realize the importance of it.

  • Some really useful info here, I have implemented these tips to stop scrapers on my website and am now waiting to see what gets caught in my spider trap. I am a little confused about how to do the IP blacklist for Apache. Tried to follow instructions on honeypot.org, but got a little lost on that, i found the module for Apache, but am not sure how to go about it. Hopefully the honeypot and htaccess rules will help for now.

  • Why would you want to stop scrapers? Is what you say not important enough to warrant additional distribution? Are you on so much of an ego trip that the important thing is people know the source of your ideas, rather than being exposed to the ideas themselves? Are you too stupid to outrank a scraper site? Are you too stupid to MONETIZE being scraped? You know, if the scraper takes the post wholesale, then affiliate links are intact. Link to yourself with good anchor text, and you have a scraper adding perfectly anchored links to you.

    I love havnig my sites scraped. It’s just one example of how there’s a conspiracy to make me successful.

    Here’s the deal, if you REALLY are offended by people taking your work and distributing it for you, stay off the internet. Apparently you aren’t expressing any ideas that are worthy of widespread attention anyway, since you want to limit your distribution.

  • As a matter of fact, you can not avoid all of the spams, so I think Akismet is enough for WordPress, the more plugins you used, the worse your readers will feel. For the scrappers, I will fight with them whatever I can do if they really angry me.

  • Good article, when ever you find duplicate content just intimate it to google then they ll ban the site form google search engine.

  • Thanks a lot Darren for this useful post.

    I must say that I truly enjoy reading your blog as I am starting out blogging myself and have found many of your postings truly useful.

    Even postings as far back at 2007 or 2008 are still relevant today.

    Thanks for providing all the insight!

  • I’m not worry about comments, as using akismet and a bit of time to moderate do the work.

    My main concern is when it comes to copying my content, that’s why I found first point pretty useful.

  • Didn’t know about “blocking”. That’s good advice. I’m surprised Google hasn’t come up with a better system to prevent scraping. Its a major pain to serious bloggers.

  • Thanks for yours information you have given step by step points to stop spamming and scraping.like
    Captcha
    Plug-ins
    Identify
    Ask
    Block
    Take Action
    As a blogger every one is getting disturbed from this spammer and scrapper they steal your stuff.most of the spammer using social networking sites for the spamming
    i think your information may stop spamming.
    thanks

    .

  • I use Fair Share which sends me a report and a link to the scrapper site via feeds. I have had two unattributed copies – in both cases by beginner bloggers who didn’t know better. But most scrapper blogs take just a portion of my post but they do add the links.

    I don’t understand why they would even take some of my jewelry making posts for a pseudo gardening or even construction site!!

    I also place a link signature at the end of each post.

  • This is a timely subject for me. Akismet catches spam on my blog. I’ve been blogging nearly 2 years, but I didn’t realize until a couple of weeks ago that I need to check the spam regularly to despam comments that shouldn’t have ended up there. A big “Duh!” I know. I’ve blacklisted dozens of IP addresses, porn words, and maybe two thirds of the pharmaceutical drug names known to mankind, so my spam filter is fairly tight as a result. I simply need to check regularly for those good comments, since I get so few.

  • Scraping is the most disruptive of the 2 problems. Spam is now becoming more and more easy to stop, especially with Wordpress plug-ins.

    Scraping is also better, but the challenge is when overseas bloggers take content.

    Most importantly… if you take the easy way out, you will never succeed in any venture.

  • Spammer are going headache for blogger like us. I really feel bad when someone just put “great post” on comment. I now put disqus comment form on my blog and moderate comment before they publish. I think every blogger should have moderate comment before they are published.

  • First, thanks for the post and the links to Copyscape and CopyGator. I’ve only had this happen to me one time (that I know of), but those sites will be a great resource to make sure it doesn’t happen again. The one time it happened was an interesting lesson.

    I have Google alerts set to anything related to “film music,” “film scores,” my website or my name. Within minutes of my post appearing on another site (I think it had more to do with the timing in relation to Google’s alert schedule, i.e., mid-day), I got a notice and was able to see that the entire post had been copied verbatim.

    Now, I WAS given credit as the author of the original post, but it had been copied and pasted directly on this other site. I emailed the site owner, thanked him for his kind words about the post, but asked him to provide a direct link back to my site. This was for two reasons: 1) for the traffic it would drive to my site, and 2) to alleviate any possible ramifications with Google rankings (I’m not sure if there are any, but I don’t want to take any chances). He did it without any problems and the whole transaction was very smooth. I realize this isn’t exactly a scraper or spammer per se, but it does show that sometimes innocent mistakes are made and can be resolved very easily.

    Hopefully that’s the worst that’ll ever happen with that. :)

  • I am also facing Commenting Problem in my blog.Allot of members are coming to post there Links and nothing.
    They don’t share any thing related to the post.But they are just leaving there links.

  • nice article
    i have previously tried to contact google and report scrappers but got no response at all , even when they were copying whole documents without changing anything in them

  • Most scrapers simply copy your whole post (links and all), or grab your rss feed. If you make sure you link back to your blog in the body, or even at the end of the article then many readers will find their way back to you (people love to click).

    Also if you use affiliate links, or you’re promoting your own product, then the scrapers are helping you:-) I’ve made several sales where the clicks came through from a scrapers site.

    If you can’t stop them, just make sure when they steal your content it benefits you in the end. Beat them at their own game.

  • Block the scum. I use my cPanel and block them from my entire site. Saves time since I have 6 blogs and adding the IP to all of the .htaccess files takes more time. I have reported them to their provider, but I doubt that does any good. Oh, and it’s a no-tolerance policy on my blogs. Spam it once, and you’re done.

  • You have to take content moochers on head immediately. I have a zero tolerance policy for this.

    Commenting is a big problem for me lately. I use Akismet on my personal WP blog, and approve comments on client Typepad accounts now, because of the massive amounts of spam and troll-like activity. Some are spam (especially on some client blogs), but others are spam without the poster realizing it.

    For example, authors leave comments at my personal blog that are nothing but the summary of their book and a link to buy on Amazon. There is no contribution to discussion; just an advertisement. Some authors I’ve talked with don’t realize this is really spam. I delete these. A few publishers have left comments about reviewing their books (usually vanity presses, but not always). If someone leaves a comment and says specifically that they could not find my contact information, then I’m a little more lenient, depending on the request.

    What is most frustrating about comments are publications that don’t track them – especially newspapers. There are obvious spammers who use the same handles at different sites and copy and paste the same exact comments at those publications. Yet, news publications do nothing to stop this. These are also the folks who usually attack anyone who leaves an opposing comment – no matter how nice that opposition is in the writing.

    It is also frustrating when sites allow trolls to take over. I’ve since stopped reading a few blogs and other sites because of the lack of troll control. I think it is disrespectful to readers to allow that to continue.

  • There’s one technique that I was surprised not to see here: reporting scrapers to Google AdSense. The majority of the times I’ve found sites running my content without attribution, they had Google AdSense running in the sidebar. These scrapers are trying to make money, so hitting them there hurts a lot more.

    AdSense even has an online form where you can report scrapers. (https://www.google.com/adsense/support/bin/request.py?contact_type=dmca_complaint).

  • Aksimet is the best. I am glad you don’t use Captcha. I understand its purpose, but I really hope that blogging does not come to that.

  • As a relative newbie to blogging, I am amazed at the length some spammers will go to post garbage on my blog. Multiple names, email addresses and such. I often wonder why they spend valuable time trolling the internet when they could be doing legitimate business elsewhere. I edit or delete suspicious submissions and I often check who.is for more data.
    As to the topic of scraping, this is news to me and I will go read up on that one. It sounds pretty sad that some people do such things especially when there are other creative options one can explore… I add a copyright mark and my name/signature to my posts and will explore some of the excellent suggestions made here. Who knew?
    Cheers,
    E

  • If the scraper has Google Ads on his site, you can get Google to suspend his account for stealing your content.

    Just click on the ‘Ads by Google’ icon on their site.
    Click on the “Send Google Your Thoughts” link
    There you can check the violation checkbox that says “The site is hosting my copyrighted content”

  • I use akismet plugin and it’s really helpful,though im still open for new ideas.

  • I use Akismet for comment spam, and I’m very happy with its effectiveness. It has filtered out all spam comments quite effectively.

    My site gets scraped regularly. If I have time, I run Copyscape checks. I hate that I have to do this policing. I’d much rather be blogging or almost anything else. I’m not a famous blogger, but I can imagine that your scraping problems are astronomical if a small blog like mine gets scraped so often.

    Thanks for the info about blocking .htaccess. I wasn’t aware that could be done.

  • Before reading this article I though akismet is not good for spam blocking but thanks for sharing this information with us and from now onward I use that plug in.

    Great article..

    Alam

  • You can NEVER go wrong with unique content obviously. It’s something the net, and Google are dying to index, yet few have. On the other hand, I don’t mind re-publishing of articles if the original author gets their credit and they put the article somewhere decent on their site.

  • Thanks for sharing this information. I recently encountered a situation where a company scrapes content from various sites and uses it to put up a “directory” of a city. They charge others $1000 to advertise on this “directory.” In some posts, they put a link to the original site. In others, they don’t.

    I asked for my content to be taken down, pointing out the illegality of using my copyrighted content without permission. They obliged.

    It still infuriates me that they are profiting from plagiarized content. They even have the nerve to put their own privacy policy and copyright notice on their site.

  • My good friend Si Dawson just designed a great Twitter app for getting rid of spammers on Twitter. It’s called Twit Cleaner and gives you a very detailed report of people you need to block or unfollow etc. It’s such an awesome spam solution. http://www.twitcleaner.com :-)

    I think there’s a growing trend of people that want to rid their blogs & social media sites of the spammers once and for all.

    Cheers,
    Sarah ;)

  • Tips: the fastest way to block these spammers is “htaccess”.
    If you know how to use them…

  • There is a fundamental problem with CAPTCHA, that it is in most cases not accessible for people with disabilities. There are certain solutions which might work, but in general a person who is blind cannot enter text from an image, but with their screen reader they are able to solve a challenge question. There is something that’s called an audio CAPTCHA which they can use, but it is still not helping deaf blind people. CAPTCHA in most of its forms is also creating a challenge for people with cognitive disabilities.

    It is not to say don’t use verification, but use it wisely, put yourself into the shoes of people with different disabilities.

  • Capcha features make commenting impossible for blind readers. I use capcha on my blog, but I’d love an alternative. It’s just not accessible.

  • This was an excellent article, and I love when I see people blogging about it.

    It is so important to add content when blogging, time and time again, i’ve seen people put a “Great Article” with their link after it.

    They added nothing!!

    Nothing at all!!

    My favorite ways to prevent this is to use good old fashioned “Moderate” each comment, and “Capcha”, both should keep the spammers out. I’m wondering what you’ll have as I push the “submit” button.

    Anyways, powerful article on scrappers as well. I haven’t seen a lot of duplication, but I have seen tools that take blogs, articles, etc, and spin them into a completely different article, just saying it in a different word.

    Not that I do this, but the tool operated by reading a few words at a time, and changing the order of the words around, and you chose. So you can spin like 10 or 20 articles from just 1.

    Subject at hand is not to Scrape, period. It’s wrong. But your advice is great. Contact the person first, to get their side of the story.

    See you on the net,

    William Whitlow
    http://www.williamwhitlow.com

  • I use Mollom, a great module which can be installed on all major blogging platforms (Drupal, WordPress, Joomla). It has some very innovative features like:
    - multilingual support for identifying spam;
    - CAPTCHAs & text-analysis filters which really work;
    - “crowdsourcing”. Basically all sites protected by Mollom can report comment spam that slipped through the cracks. Mollom combines and correlates this information and learns from it to help prevent future abuse.

    Been using it with great success on http://www.7tutorials.com
    Now I no longer need to use comments moderation. I only get 1-2 spam messages slipping/month which can be easily deleted.

  • I felt your effort of posting such useful links for us. Extremely well done and your choice of language makes them even more interesting. An occasional bit of irony or an elegant twist in the phrase is a welcome relief.

  • spammers are always annoying for me…but i have got rid from it by some wordpress plugins

  • Thansk for the Great article! It’s perfect for those of us that are still learning to be effective bloggers.
    Fighting spam sucks and I’m thankful that WP offers the necessary plug ins.
    I would have to agree partly with some of the comments above that scrapers can actually work in your favor.
    To me, it is simply a way to spread your message and as long as you link your article correct, it builds external links for you.
    Do I understand this correct or am I missing something?
    Hmmm?

  • Thanks for some great tips. I love Akismet, and really love the “Delete All Spam” button. Boom! All gone!

    I didn’t know you could block it with the .htaccess file, so that’s a great tip. Haven’t had to use it (yet), but thanks for that tool to put in my toolbox!

  • Thank you, Seth. This was both frightening and instructive. I feel older and wiser for your words. Best regards, P. :)

  • Excellent info. I ran into a situation a couple months ago where someone scraped and entire post of mine and put it on their site, used exactly the same title and content, and then sent out a Tweet about it — using exactly the Same Title in the Tweet. Rather infuriating. Your post is very helpful for addressing this. Thanks.

  • Aren’t spammer looking for do follow blogs only? If I make my blog no follow, will it reduce comment spam? And isn’t wordpress blog no follow by default?

  • Crazy blogger: Wordpress is Nofollow by default, but going nofollow does not mean no comment spam. Most of my blogs are nofollow, but still receive a disgusting amount of comment spam attempts, which I suspect are mostly automated. Akismet and manual approval is still the best way to go!

  • Before quitting I must say that you should start blogging as your hobby instead of earning money. Because if the aim of your blogging will be money than its for sure that you will get bore after three or four months when money will not come.

    So in the initial period simply enjoy blogging and after establishing good readership you can try to earn money.

  • Thanks for the additional information in case I ever have to go to the next step.

  • I have my comments set so that I have to approve every first time commenter. After that, everyone who has been approved can comment freely.

    I use Akismet. I used to use Bad Behavior but turned it off because I was having a problem with MSN not indexing my site (when it was MSN).

    I haven’t seen my content turn up on other blogs yet except for the occasional blog who will post the title of my post and an excerpt with a link back to my blog. Still I try to interlink my posts and I use the WP Pluging copyright feed that puts a little notice at the bottom of my feed that states if the person is not reading the post in a feed reader then the blog they are reading is committing copyright theft. It’s not much but something is better than nothing.


Comments will be closed off on this post 90 days after it is published. Apologies to those this impacts but it's a regrettable and temporary measure to combat a growing comment spam problem. See our most recent posts where you can comment here.

Leave a Reply




Close
E-mail It