How to call an extractor via API


#1

I was wondering if its possible to call an extractor from the API?

I have an extractor on the website with multiple URLs and was wondering if I could call the entire thing?


#2

Hey!

The Extractors API docs can be found here http://api.docs.import.io/

Calling the whole configuration is not possible at this early stage. But it is definitely something we are intending to add.

For a workaround at this stage you could make multiple calls to the API via a for loop in python for example, exchanging the URL in each loop.

@alex.gimson


#3

Thanks for getting back to me.

I have implemented something just like what you suggested (in Node).

For anyone else that may run across this, this is my workflow.

  1. Create a single extractor in import.io for each website you would like. (that way you get the columns to line up to the correct data being pulled out). Even if you would like to use multiple URLs you only need to set up one.

  2. Run a loop that calls the extractor API. You should loop over all the URLs you want to grab data from. If you have URLs from two websites, you need to swap out the extractor configuration id. (which can be found in the extractor integration tab)

Below is a sample of how I am doing the above in Node. (Using bluebird)

const getFeed = function (urls, configId) {
    const runtimeConfigurationId = configId;

    return Promise.mapSeries(urls, function (singleurl) {
        console.log("Fetching URL", singleurl);
        return request({
            json: true,
            url: `https://extraction.import.io/query/runtime/${runtimeConfigurationId}?_apikey=${importIOApi}&url=${singleurl}`
        })
            .then(function (response) {
                console.log("Status Code", response.statusCode);

                if (response.statusCode !== 200) {
                    throw response.body;
                }
                console.log("Url Fetched", response.body.extractorData.url);
                console.log("Feed Length", response.body.extractorData.data[0].group.length);

                return response.body.extractorData.data[0].group;
            })
            .catch(function (error) {
                console.error("I just threw up", error);
            });
    });
};

#4

You can create the Extractor to the structure of page resulting from these URLs that you retrieve.
When this is created, you can make API calls to this Extractor using the new URLs that come through each time you receive one. You will get the data back in JSON.
As long as these URLs have the same underlying structure, it shouldn’t be a problem.
If you own a website, why don’t you improve it with these tools?
Magento 2_ Mega menu extension
Magento 2_Social login extension