Reddit extraction


#1

Really NOOB seeks help. I am a tech comm grad student in an IA class. Assignment is to generate 50K n-grams from a community of practice. Translated - I need about 1125K +/- words to create the corpus. He wants us to use Reddit. I picked r/agile - meets requirement. All the examples use YELP YELP YELP. Not helpful.
So kindly, can ya’ll help? I’ve watched videos, read help and searched internet. How do you extract the comments from the post? I capture the ‘subject line question’ and ‘# comments’ - but cannot figure out how to get the actual question (expanded from subject line q) or the expanded number of comments (expanded from # comments). I know I’m going to have to chain lots of pages together, too. I feel like this should be easy - so I feel really dumb. :frowning:


#2

Hi Felicia!

So it sounds like you’re on the right track, with chaining! You’ll want to check out https://help.import.io/hc/en-us/articles/360000055551-Extracting-URLs-with-Chained-Extractors which goes over this and I put together two example extractor below along with a screenshot showing how to chain the details extractor to the listings extractor. You should be able to copy these extractors over to your account by clicking the “Duplicate” button in the upper right corner.

Hope that helps you out!