32 URLs, 32 results reported, but only 30 rows of data collected


#1

We are currently evaluating import_io functionality. I just ran a basic extractor on 32 unique URLs (no duplicates) and the result of the run shows "32 URLs: 32 successes Duration: 00:00:36s Total Rows: 32.

However, when I download the results, the file only contains a header row and 30 rows of data, not 32. The log file lists all of my URLs and no errors. Each log entry ends with ,“1”,"","",""

This is huge problem as there is no way to know which URLs were skipped or why. We would have to go through the results list and compare it to each of the requested URLs to figure out which ones were skipped.

I’m happy to provide any details necessary to troubleshoot this. If there is no easy fix or explanation for why the stated results and the actual data do not match, it would certainly discount import_io as a viable solution for scraping content. There are several other community threads reporting similar problems from earlier this year with no responses from import_io.

Any help you can provide would be appreciated.

RUN SUMMARY AND RESULTS

P.S. Now that I’ve made two “dummy” topics, your site will finally let me submit my real problem. It looks like adding the text “import.io” to a post counts as a “link”. You might want to fix this.


#2

Hi Paul,

Sorry about that. Can you share the GUID of the Extractor? dash.import.io/extractors/xxxxx-xxxxx-xxxxx-xxxx-xxxx with the extractor selected on the dashboard?

Thanks,

Alex


#3

Here is the Extractor I was using:
https://dash.import.io/extractors/3a519ce5-521c-4d49-827f-969e3da14802

Thank you!