Building simple Reddit scraper

Posted by Bazant Fundator on Stack Overflow See other posts from Stack Overflow or by Bazant Fundator
Published on 2013-06-25T10:19:23Z Indexed on 2013/06/25 10:21 UTC
Read the original article Hit count: 396

Filed under:
|
|
|
|

Let's say that I would like to make a collection of images from reddit for my own amusement. I have ran the code on my development env and It haven't gone past the first page of posts (anything beyond requries the after string from the JSON. Additionally, When I turn on the validation, the whole loop breaks if the item doesn't pass it, not just the current iteration. I would be glad If you helped me understand mistakes I made.

class Link
    include Mongoid::Document
    include Mongoid::Timestamps

    field :author, type: String
    field :url, type: String

    validates_uniqueness_of :url, # no duplicates
    validates :url, uniqueness :true

end


def fetch (count, after)
    count_s = count.to_s # convert count to string
    link = "http://reddit.com/r/aww/.json?count="+count_s+"&after="+after #so it can be used there
    res = HTTParty.get(link) # GET req. to the reddit server
    json = JSON.parse(res.body) # Parse the response

    if json['kind'] == "Listing" then   # check if the retrieved item is a Listing
        for i in 1...(count) do # for each list item
            datum = json['data']['children'][i]['data'] #i-th element properties
            if datum['domain'].in?(["imgur.com", "i.imgur.com"]) then # fetch only imgur links 
                Link.create!(author: datum['author'], url: datum['url']) # save to db 
            end 
        end
        count += 25
        fetch(count, json['data']['after']) # if it retrieved the right kind of object, move on to the next page
    end 

end

fetch(25," ") # run it

© Stack Overflow or respective owner

Related posts about ruby

Related posts about JSON