scrubyt - Developer IT

How to export scrubyt extractor?

- by robintw

I've written a scrubyt extractor based on the 'learning' technique - that is, specifying the current text on the page and getting it to work out the XPath expressions itself. However, I now want to export the extractor so that it can be used even when the page has changed. The documentation for scrubyt seems to be all over the place now, but from what I can find I should be able to put the line extractor.export(__FILE__) and it should work. It doesn't - I just get an error saying that there is the wrong number of arguments for export, it should have 0. I've tried it without any arguments and it still fails. I would ask on the scrubyt forum, but it seems like no-one's been there for ages! Any ideas what to do here?

Read the article

Scrubyt: Using big5 strings in query_field for fill_textfield

- by kuribo

Does anyone know of a way to get fill_textfield to accept a big5-encoded string in the query_field? I keep getting an "unterminated string meets end of file" error with this: require 'rubygems' require 'scrubyt' search_data = Scrubyt::Extractor.define do fetch 'http://www.google.com/ncr' fill_textfield 'q', '????' submit end

Read the article

Is it possible to set the referer with Scrubyt?

- by Jake

I can't seem to get a page to load with scrubyt and I think its because the page I am navigating to checks the referer. Is it possible to set the referer on the fetch action?

Read the article

Screen scraping software that will traverse pages

- by nilbus

We're creating a mashup site that pulls information from many sources all over the web. Many of these sites don't provide RSS feeds or APIs to access the information they provide. This leaves us with screen scraping as our method for collecting the data. There are many scripting tools out there written in different scripting languages for screen scraping that require you to write scraping scripts in the language the scraper was written in. Scrapy, scrAPI, and scrubyt are a few written in Ruby and Python. There are other web-based tools I've seen like Dapper that create XML or RSS feeds based on a webpage. It has a beautiful web-based interface that requires no scripting skills to use. This would be a great tool, if it were able to traverse multiple pages to gather data from hundreds pages of results. We need something that will scrape information from paginated web sites, much like scrubyt, but with a user interface that a non-programmer could use. We'll script up our own solution if we need to, probably using scrubyt, but if there's a better solution out there, we want to use it. Does anything like this exist?

Developer IT