Extract data from Cloud Flare


#1

Hello, I set my Extractor on Import.io desktop app, end everything is working perfect. But after 5 days source site is turning protection, it is checking that site is accessed by browser. Protection is generated by Cloud Flare. Now I am no able to get my data by Import app.
But for me it is strange because only I was using extractor and per day I had maybe 10 requests to that site. At that low rate of requests how Cloud Flare recognised Import.io as scraper? Is there any help?


#2

Hey there,

Although we rotate IPs, if other users are scraping the same website it is possible that the website is being hit too fast and too regularly.

When you say 10 requests, were you calling 1 URL each time?

Which tool were you using?

Alex


#3

Hello,
I am using desktop app import.io
I have two extractors, one is scraping list of 10 links and some text, and another extractor is getting data that are on pages of that links.
I do it in way that my mobile app send JavaScript Ajax request to 2. extractor with link that I got from 1. extractor.
All was working perfect but after two days I got Cloud Flare protection that is checking browser. When protection is on it is sending cookie “cf_clearance” to browser.
I also thot that some other user are attacking that site, but I am not sure. After two days they turned off protection, then I tried my import.io extractor and they worked perfec but after 10min protection is again on.
I think when ever user of may mobile app is requesting data from import.io, that your app is sending request to my source data web page.


#4

Well, I am sure that Claud Flare is turning on protection because Import.io requests. Now that I am not using import.io extractor protection is off.


#5

How often are you sending requests?

If a website is choosing to use cloud flare to protect their data and it is blocking the import.io IP then it’s out of our hands really.

It looks like cloud flare have the ability to block specific IPs https://www.siteground.co.uk/tutorials/cloud_flare_cdn/trust-block-ips.htm

Equally if the website is happy for you to scrape their data, it looks like certain IPs can also be whitelisted.

Alex


#6

This is chronology:
for 3 days i send about 20 requests and Cloud flare protection is OFF;
then, Cloud flare protection is ON and After 2 days Cloud protection is again OFF, but after 10min I send few requests (about 5-10) and protection is ON again. Then protection is ON for 2 day and after that is OFF, I send no more requests and protection stays OFF. For the whole time I was extracting data with my browser and some JavaScript.