1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Useful info if your content is scraped by bots

Discussion in 'Content Management' started by sabin, Feb 3, 2014.

  1. #1
    Hello,

    It's been a while I thought I could share the info, who knows, maybe other people may find it of use.

    I've had two websites that were frequently cloned by autoblogs, but smartly, as the blogs were using a technique called "scraping" : copying everything, and then not posting a copy, but re-posting as their own. See the difference ? Embedded images would be uploaded in the cloner-blogs as legit contents.

    Blocking IPs and servers proved useless, as the scraping was invisible, it acted as a legit visitor activity, and obviously the sraper engine was not on the same server as the clone sites.

    Well, here's the trick : a CDN. I tested with a paying CDN's temporarily free offer, and later on with cloudflare, and the moment my DNS pointed to the CDN services, *poof*, no more scraping, the autoblogs stopped leeching my contents.

    I simply have no clue about WHY that happened. Something with the global CDN settings probably.

    But, well... that's good to know annoying cloning automated sites can be blocked easily like that :)
     
    sabin, Feb 3, 2014 IP
  2. Sean DeSilva

    Sean DeSilva Greenhorn

    Messages:
    70
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    23
    #2
    I recently stopped using cloud flare as it slowed down my website, especially JavaScript loading. So be on the watch for website slowdowns.
     
    Sean DeSilva, Feb 27, 2014 IP
  3. sabin

    sabin Well-Known Member

    Messages:
    114
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    130
    #3
    Well, you can also fine-tune how cloudflare caches your website. If you don't want jscript to be affected, you can choose it. I personally found my biggest use for cloudflare was to stay within limits of monthly bandwidth usage with a former web host.
     
    sabin, Feb 27, 2014 IP
  4. competent123

    competent123 Notable Member

    Messages:
    1,751
    Likes Received:
    71
    Best Answers:
    6
    Trophy Points:
    255
    #4
    cloudflare disallows non browser traffic , like scrapers etc, it only allows whitelisted bots ( google/msn/yahoo etc etc)

    that is why scraping is no longer done on cloudflare protected site.
     
    competent123, Mar 5, 2014 IP
  5. damoncloudflare

    damoncloudflare Greenhorn

    Messages:
    78
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #5
    This statement is really not true. We will most certainly help reduce the number of bad bots hitting your site - but not all - but you could also look at something like ScrapeShield (CloudFlare app) to help monitor for content theft. You could also look at blocking offending IPs in the CloudFlare Threat Control panel.
     
    damoncloudflare, Mar 6, 2014 IP
  6. damoncloudflare

    damoncloudflare Greenhorn

    Messages:
    78
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #6
    Did you open a support ticket?

    The only thing we do that might affect JavaScript in some way is Rocket Loader & this is an optional feature that can be turned off (it generally helps speed up JavaScript, but there are sometimes issues that can happen with Rocket Loader because of how a site has JavaScript running on it).
     
    damoncloudflare, Mar 6, 2014 IP