This is a classic use for “chaining” extractors. First you need to build an extractor that gets all of the URL’s from the first page.
For your first one, it’s a little tricky because you have multiple pages. This is easily solved though. We’ll start with the first link which is basically your list of dealers:
That link only shows 30 dealers per page though. So scroll down to the bottom and click on the “120”. Now you’ll see that your URL has changed to:
This is good because it will use less queries to get the list you are looking for. Now we need to build a list of URL’s that will show Import.io all of your URL’s that you need to get data from. So let’s click on “Page 2” at the bottom and pay attention to how the URL changes:
This one is pretty easy because you can see that the page number is shown at the very end of the URL as "2’. The subsequent pages will look like this:
http://www.chrono24.com/search/haendler.htm?pageSize=120&showpage=4… and so on.
The next step is to figure out how many pages there are. We can see at the bottom of the page that there are 19 pages. So you need to build your extractor to have the 19 URL’s to the different pages.
So click “New Extractor,” and paste this into the URL field: http://www.chrono24.com/search/haendler.htm?pageSize=120
Let Import.io do some thinking and you’ll see the page come up. Now all you really need is the URL from the “Underline” column. This will give your second extractor all of the URL’s it needs to pull the data you are looking for. We’re not done yet though. Now we need to add the remaining URL’s for all of the pages. Go ahead and click on the down arrow next to “Save and Run” and click on Save. We don’t want to run this extractor yet.
You will be brought back to your extractor details screen. Import.io has a wonderful tool called the URL generator. We’ll use this to generate all of the links you need. First we need to give import.io a couple url’s to use. Paste this link into the field where it says “Enter or Paste URL here…” : http://www.chrono24.com/search/haendler.htm?pageSize=120&showpage=2
Click “Save URL’s.” Now click “Show URL Generator.” Click “Edit” and overwrite the URL that is in the list with: http://www.chrono24.com/search/haendler.htm?pageSize=120&showpage=2
Import.io tries to figure out which parameter in that link controls the page number. It thinks the “120” is the page number, but it’s actually the “2” at the end. This is easy to fix. Click the ‘x’ next to the parameter 1 field. Then simply double click on the “2” at the end of the url. You’ll be presented with some controls on how to create the list of URL’s. You already have your first and second page in the list, so you can set the range to 3 and 19 with a step of 1. Then click “add to list.” Now click “Save URL’s.”
Now you have an extractor that will build your list of URL’s for you. Pretty slick.
Run this extractor. You can also click this link to get the extractor we just built:
Now we need to create the second extractor to go get all the details you’re looking for. Click “New Extractor,” and we will paste a URL from one of the dealers into the popup. Let’s start with this link: http://www.chrono24.com/dealer/juwelierburger/index.htm
Once the webpage comes up, click on the Website tab in the upper right hand corner of the page. This allows us to see what data we’re extracting. Click “Delete All Columns” to give us a fresh start. Now this page is interesting because it has a “Show More” link at the bottom. This is annoying, but very easy to bypass. Click on the “Styles On” button in the upper left hand corner of the screen. This turns off all of the fancy web styling that can mess up our extraction. Now you need to create some columns and select the data you want to put in those columns. When you are finished, save the extractor.
Now we need to connect this extractor to the first extractor you created, this is called “chaining.”
You should be back in the extractor details screen now. Click on the dropdown that says “An explicit list of URL’s” and select “URL’s from another extractor.” In the “Search for extractor by name” field, type the name of the first extractor you created and select it. Now we need to tell it where it will find the URL’s. You can see above that the first extractor we made had the link for the dealer information in the “Underline” column. So click on URL Column and select "Underline."
You can see the extractor we just created by clicking on this link: https://dash.import.io/5872abdf-2f4c-4e50-bd75-168fd7eae694
That’s it! you’re ready to go!. Go ahead and run this extractor and watch the magic happen. As it runs you can click on the “eye” icon to preview the data you’re grabbing. Hope that helps!