Xpath for Extraction


#1

I want to extract data from www.indiabix.com with the coulmns like :
http://www.indiabix.com/general-knowledge/basic-general-knowledge/

It is also having pagination on site.

  1. question

2)option 1

3)option 2

4)option 3

  1. option 4

5)Answer ( this is java script enabled)

6)explanation

Except answer which are hidden i was able to get the data i wanted. I
downloaded the desktop app. also i m using click on element option but
result is not expected as it is not extracting the “character which is the
answer” other than that whole thing is being extracted.

Any chance to get the answers. I have attached the screenshot indicating,
what is not being extracted in desktop app too.

I have marked it as red. in the portion of answer. Thanks

I tried to use following Xpath for extraction of answer.

.//tbody/…/…/div//p1/b

But the problem IS FIELD is not being extracted.

its showing blank in the following screenshot.

I have directly copied the xpath to the XPATH section. Where I am wrong?


#2

Hey Rajesh,

Great question.

In this case, the XPath is allowing you to get data that is hidden behind javascript. For this to work, you need to create your Extractor with Javascript ON if you are doing this in the old Extractor (from your image it looks like javascript is OFF (this happens by default)). The JS button in the top right hand corner of the page.

Equally, you can do this in the Web Extractor. You can get started here http://dash.import.io

Hope that helps!


#3

Hey.

Thanks for the reply. I did js ON.

But it is showing VIEW ANSWER but not its answer.


#4

Using our XPath override feature allows you to take data from the HTML. If you are comfortable with using XPaths then you can insert them into the advanced column settings part of the workflow.

However, if you would like more information on XPaths, then please check out the following link:
http://www.w3schools.com/xsl/xpath_intro.asp

We have also written some common examples of XPaths used in import.io here:
http://support.import.io/knowledgebase/articles/341182-advanced-column-extraction-settings

As well as a full example of how to find and construct XPaths to extract certain data here:
http://support.import.io/knowledgebase/articles/697236-extracting-images-through-xpaths

I hope this helps!


#5

@Rajesh_Bhammar

If you still haven’t figured this one out Rajesh, please post the GUID of the Extractor.

It’s in the URL bar when you are on the My Data page.

Alex


#6

Here is it:
https://import.io/data/mine/?id=a71d8502-81bc-4980-ba7b-0df8b1504377


#7

HI Rajesh,

I’m not sure how you decided on your original XPath. Did you find this on a Stack Overflow question?

It looks familiar but is not the correct one for this page.

.//tbody//div//p/span/…/b

in fact, this looks like it also works:

.//span/…/b

Alex


#9

Yes, xPath is allowing you to get data that is hidden behind javascript. If you are comfortable with using XPaths then you can insert them into the advanced column settings part of the workflow.
http://rewardpointsextension.com/
http://megamenuextension.com


#10

im also want more study materials like pdf tnpsc exams