Search Results

Search found 45 results on 2 pages for 'hpricot'.

Page 1/2 | 1 2  | Next Page >

  • how to translate this hpricot code to nokogiri ?

    - by wefwgeweg
    Hpricot(html).inner_text.gsub("\r"," ").gsub("\n"," ").split(" ").join(" ") hpricot = Hpricot(html) hpricot.search("script").remove hpricot.search("link").remove hpricot.search("meta").remove hpricot.search("style").remove found it on http://www.savedmyday.com/2008/04/25/how-to-extract-text-from-html-using-rubyhpricot/

    Read the article

  • hpricot segfault?

    - by AP257
    Any idea why hpricot might segfault on this page? trial_url = 'http://www.controlled-trials.com/ISRCTN56071145/' doc = Hpricot(open(trial_url)) produces: /Users/ap257/.gem/ruby/1.8/gems/hpricot-0.8.2/lib/hpricot/parse.rb:33: [BUG] Segmentation fault ruby 1.8.7 (2009-06-08 patchlevel 173) [universal-darwin10.0] Abort trap Please could anyone advise on how I could get around this, or whether it's a bug in hpricot that I should report somewhere? Thanks!

    Read the article

  • Bundler doesn't want to install hpricot on Windows XP with Ruby 1.8.7

    - by Nick Gorbikoff
    Hello I develop on a Windows machine but deploy to Debian. Trying to use hpricot with Rails 3 app. I can get the gem to install using : gem install hpricot --platform=mswin32 But when I do this in the bundle file - it keeps throwing an error (I think it's trying to install the wrong version of hpricot (not windows specific) group :production do gem "hpricot", "0.8.3" end group :development, :test do gem "hpricot", "0.8.3", :platforms => [:mswin, :mingw] end This is from another question here on stackoverflow - but it's not working for me. Any ideas? P.S.: Windows XP sp3 with Ruby 1.8.7 with Rails 3.0.3 with bundler 1.0.7 EDIT Forgot to paste my error: bundle install Fetching source index for http://rubygems.org/ which: no sudo in (.;C:\Program Files\ImageMagick-6.6.5-Q16;C:\ruby\Ruby187\bin;C:\Program Files\ActiveState Komodo Edit 6\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Program Files\e\cmd;C:\Program Files\MySQL\MySQL Server 5.1\bin;C:\WINDOWS\system32\WindowsPowerShell\v1.0;c:\tools;C:\gnuwin32\bin;C:\tools\wkhtmltopdf;C:\Python31;C:\Program Files\TortoiseHg\;C:\Program Files\TortoiseGit\bin; c:\program files\videolan\vlc;C:\Program Files\SMPlayer\mplayer;C:\Program Files\Git\cmd;C:\Program Files\QuickTime\QTSystem\;C:\Program Files\Calibre2\;c:\ruby\jruby-1.5.5\bin;C:\Program Files\Common Files\Shoes\0.r1514\..) Using rake (0.8.7) Using abstract (1.0.0) Using activesupport (3.0.3) Using builder (2.1.2) Using i18n (0.4.2) Using activemodel (3.0.3) Using erubis (2.6.6) Using rack (1.2.1) Using rack-mount (0.6.13) Using rack-test (0.5.6) Using tzinfo (0.3.23) Using actionpack (3.0.3) Using mime-types (1.16) Using polyglot (0.3.1) Using treetop (1.4.9) Using mail (2.2.10) Using actionmailer (3.0.3) Using arel (2.0.4) Using activerecord (3.0.3) Using activeresource (3.0.3) Using bcrypt-ruby (2.1.4) Using bundler (1.0.7) Using cancan (1.5.0) Using haml (3.0.24) Using compass (0.10.6) Using warden (1.0.3) Using devise (1.1.5) Installing hpricot (0.8.3) Temporarily enhancing PATH to include DevKit... with native extensions C:/ruby/Ruby187/lib/ruby/site_ruby/1.8/rubygems/installer.rb:483:in `build_extensions': ERROR: Failed to build gem native extension. (Gem::Installer::ExtensionBuildError) C:/ruby/Ruby187/bin/ruby.exe extconf.rb checking for stdio.h... no *** extconf.rb failed *** Could not create Makefile due to some reason, probably lack of necessary libraries and/or headers. Check the mkmf.log file for more details. You may need configuration options. Provided configuration options: --with-opt-dir --without-opt-dir --with-opt-include --without-opt-include=${opt-dir}/include --with-opt-lib --without-opt-lib=${opt-dir}/lib --with-make-prog --without-make-prog --srcdir=. --curdir --ruby=C:/ruby/Ruby187/bin/ruby Gem files will remain installed in C:/ruby/Ruby187/lib/ruby/gems/1.8/gems/hpricot-0.8.3 for inspection. Results logged to C:/ruby/Ruby187/lib/ruby/gems/1.8/gems/hpricot-0.8.3/ext/fast_xs/gem_make.out from C:/ruby/Ruby187/lib/ruby/site_ruby/1.8/rubygems/installer.rb:446:in `each' from C:/ruby/Ruby187/lib/ruby/site_ruby/1.8/rubygems/installer.rb:446:in `build_extensions' from C:/ruby/Ruby187/lib/ruby/site_ruby/1.8/rubygems/installer.rb:198:in `install' from C:/ruby/Ruby187/lib/ruby/gems/1.8/gems/bundler-1.0.7/lib/bundler/source.rb:95:in `install' from C:/ruby/Ruby187/lib/ruby/gems/1.8/gems/bundler-1.0.7/lib/bundler/installer.rb:55:in `run' from C:/ruby/Ruby187/lib/ruby/gems/1.8/gems/bundler-1.0.7/lib/bundler/spec_set.rb:12:in `each' from C:/ruby/Ruby187/lib/ruby/gems/1.8/gems/bundler-1.0.7/lib/bundler/spec_set.rb:12:in `each' from C:/ruby/Ruby187/lib/ruby/gems/1.8/gems/bundler-1.0.7/lib/bundler/installer.rb:44:in `run' from C:/ruby/Ruby187/lib/ruby/gems/1.8/gems/bundler-1.0.7/lib/bundler/installer.rb:8:in `install' from C:/ruby/Ruby187/lib/ruby/gems/1.8/gems/bundler-1.0.7/lib/bundler/cli.rb:225:in `install' from C:/ruby/Ruby187/lib/ruby/gems/1.8/gems/bundler-1.0.7/lib/bundler/vendor/thor/task.rb:22:in `send' from C:/ruby/Ruby187/lib/ruby/gems/1.8/gems/bundler-1.0.7/lib/bundler/vendor/thor/task.rb:22:in `run' from C:/ruby/Ruby187/lib/ruby/gems/1.8/gems/bundler-1.0.7/lib/bundler/vendor/thor/invocation.rb:118:in `invoke_task' from C:/ruby/Ruby187/lib/ruby/gems/1.8/gems/bundler-1.0.7/lib/bundler/vendor/thor.rb:246:in `dispatch' from C:/ruby/Ruby187/lib/ruby/gems/1.8/gems/bundler-1.0.7/lib/bundler/vendor/thor/base.rb:389:in `start' from C:/ruby/Ruby187/lib/ruby/gems/1.8/gems/bundler-1.0.7/bin/bundle:13 from C:/ruby/Ruby187/bin/bundle:19:in `load' from C:/ruby/Ruby187/bin/bundle:19

    Read the article

  • Hpricot: Stop auto fixing HTML

    - by Imran
    Consider the following example (sample data): doc = Hpricot("<a><table><tr><td>LOREM IPSUM</td></tr></table></a>") it converts it to <a></a><table><tr><td>LOREM IPSUM</td></tr></table> What it actually do is, pull out the table from <a> tag. I think Hpricot tries to repair the HTML. How can I stop Hpricot doing this?

    Read the article

  • Ruby - Writing Hpricot data to a file

    - by John
    Hey everyone, I am currently doing some XML parsing and I've chosen to use Hpricot because of it's ease of use and syntax, however I am running into some problems. I need to write a piece of XML data that I have found out to another file. However, when I do this the format is not preserved. For example, if the content should look like this: <dict> <key>item1</key><value>12345</value> <key>item2</key><value>67890</value> <key>item3</key><value>23456</value> </dict> And assuming that there are many entries like this in the document. I am iterating through the 'dict' items by using hpricot_element = Hpricot(xml_document_body) f = File.new('some_new_file.xml') (hpricot_element/:dict).each { |dict| f.write( dict.to_original_html ) } After using the above code, I would expect that the output look like the following exactly like the XML shown above. However to my surprise, the output of the file looks more like this: <dict>\n", " <key>item1</key><value>12345</value>\n", " <key>item2</key><value>67890</value>\n", " <key>item3</key><value>23456</value\n", " </dict> I've tried splitting at the "\n" characters and writing to the file one line at a time, but that didn't seem to work either as it did not recognize the "\n" characters. Any help is greatly appreciated. It might be a very simple solution, but I am having troubling finding it. Thanks!

    Read the article

  • hpricot using java?

    - by Pablo Fernandez
    I've just noticed that most of hpricot code is written in java... I heard that JRuby performed a lot better than native ruby when processing regular expression. Is maybe the java classes just activated if JRuby or Java is installed and the ruby used if these are not found? It's something puzzling indeed. Thanks

    Read the article

  • How to get an element using inner text (Watir, Nokogir, Hpricot)

    - by Hpriguy
    I have been expeirmenting with Watir, Nokogir and Hpricot. All of these use top-down approach which is my problem. i.e. they use element type to search element. I want to find out the element using the text without knowing element type. e.g. <element1> <element2> Text2 </element2> <element3> Text3 </element3> text4 </element1> I want is to get element2 and element1 etc by searching for Text2 and Text3. Please note that I do not know if elements are divs or tr/tds or links etc. I just know the text. Algorithem should be something like : iterated through all the elements, match inner text, if match get me the element and the parent element. Let me kow if this is possible in any way?

    Read the article

  • nokogiri vs hpricot?

    - by roshan
    Which one would you choose? My important attributes are (not in order) Support & Future enhancements Community & general knowledge base (on the Internet) Comprehensive (i.e proven to parse a wide range of *.*ml pages) Performance Memory Footprint (runtime, not the code-base)

    Read the article

  • How can I get Hpricot to play nice with HTML5?

    - by Adam Singer
    I am using Hpricot to parse a theme file. I have noticed, however, that if I feed a valid HTML5 document into Hpricot(), it auto-closes HTML5 tags (like <section>), and messes with the DOCTYPE. Are there any extensions to Hpricot, or perhaps a flag I need to set, that will allow HTML5 documents to be parsed correctly?

    Read the article

  • hpricot throws exception when trying to parse url which has noscript tag

    - by anusuya
    I use hpricot gem in ruby on rails to parse a webpage and extract the meta-tag contents. But if the website has a <noscrpit> tag just after the <head> tag it throws an exception Exception: undefined method `[]' for nil:NilClass I even tried to update the gem to the latest version. but still the same. this is the sample code i use. require 'rubygems' require 'hpricot' require 'open-uri' begin index_page = Hpricot(open("http://sample.com")) puts index_page.at("/html/head/meta[@name='verification']")['content'].gsub(/\s/, "") rescue Exception => e puts "Exception: #{e}" end i was thinking to remove the noscript tag before giving the webpage to hpricot. or is there anyother way to do it??

    Read the article

  • Ruby hpricot does not like dash in symbol, is there a workaround?

    - by eakkas
    I am trying to parse an xml file with hpricot. The xml element that I am trying to get has a dash though and hence the issue that I am facing xml <xliff xmlns="urn:oasis:names:tc:xliff:document:1.1" version="1.1"> <trans-unit> <source>"%0" can not be found. Please try again.</source> <target>"%0" can not be found. Please try again.</target> </trans-unit> </xliff> rb def read_in_xliff(xlf_file_name) stream = open(xlf_file_name) {|f| Hpricot(f)} (stream/:xliff/:'trans-unit').each do |transunit| .......... This does not work because of the dash. If I rename the tag to transunit and edit the symbol reference accordingly everything seems to be fine. I thought using the symbol between quotes should work but hpricot does not seem to like this. Can anyone think of a workaround? Thanks in advance

    Read the article

  • Wierd Haml 3 error with ruby 1.9.1 and rails 3

    - by Micke
    I'm getting this wierd error on my windows 7 computer when i am using the html2haml command with Haml 3 and Rails on ruby 1.9: -- control frame ---------- c:0017 p:-9593720 s:0052 b:0052 l:000051 d:000051 TOP c:0016 p:---- s:0050 b:0050 l:000049 d:000049 CFUNC :require c:0015 p:0026 s:0046 b:0046 l:000045 d:000045 TOP C:/Ruby/lib/ruby/gems/1.9.1/gems/hpricot-0.8.2-x86-mswin32/lib/hpricot.rb:20 c:0014 p:---- s:0044 b:0044 l:000043 d:000043 FINISH c:0013 p:---- s:0042 b:0042 l:000041 d:000041 CFUNC :require c:0012 p:0095 s:0038 b:0038 l:000037 d:000037 TOP C:/Ruby/lib/ruby/gems/1.9.1/gems/haml-3.0.0/lib/haml/html.rb:101 c:0011 p:---- s:0036 b:0036 l:000035 d:000035 FINISH c:0010 p:---- s:0034 b:0034 l:000033 d:000033 CFUNC :require c:0009 p:0022 s:0030 b:0030 l:000029 d:000029 METHOD C:/Ruby/lib/ruby/gems/1.9.1/gems/haml-3.0.0/lib/haml/exec.rb:559 c:0008 p:0050 s:0023 b:0023 l:000022 d:000022 METHOD C:/Ruby/lib/ruby/gems/1.9.1/gems/haml-3.0.0/lib/haml/exec.rb:41 c:0007 p:0013 s:0020 b:0020 l:000019 d:000019 METHOD C:/Ruby/lib/ruby/gems/1.9.1/gems/haml-3.0.0/lib/haml/exec.rb:22 c:0006 p:0078 s:0016 b:0016 l:000015 d:000015 TOP C:/Ruby/lib/ruby/gems/1.9.1/gems/haml-3.0.0/bin/html2haml:7 c:0005 p:---- s:0013 b:0013 l:000012 d:000012 FINISH c:0004 p:---- s:0011 b:0011 l:000010 d:000010 CFUNC :load c:0003 p:0127 s:0007 b:0007 l:000e54 d:0020c0 EVAL C:/Ruby/bin/html2haml:19 c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH c:0001 p:0000 s:0002 b:0002 l:000e54 d:000e54 TOP --------------------------- -- Ruby level backtrace information----------------------------------------- C:/Ruby/lib/ruby/gems/1.9.1/gems/hpricot-0.8.2-x86-mswin32/lib/hpricot.rb:20:in `require' C:/Ruby/lib/ruby/gems/1.9.1/gems/hpricot-0.8.2-x86-mswin32/lib/hpricot.rb:20:in `<top (required)>' C:/Ruby/lib/ruby/gems/1.9.1/gems/haml-3.0.0/lib/haml/html.rb:101:in `require' C:/Ruby/lib/ruby/gems/1.9.1/gems/haml-3.0.0/lib/haml/html.rb:101:in `<top (required)>' C:/Ruby/lib/ruby/gems/1.9.1/gems/haml-3.0.0/lib/haml/exec.rb:559:in `require' C:/Ruby/lib/ruby/gems/1.9.1/gems/haml-3.0.0/lib/haml/exec.rb:559:in `process_result' C:/Ruby/lib/ruby/gems/1.9.1/gems/haml-3.0.0/lib/haml/exec.rb:41:in `parse' C:/Ruby/lib/ruby/gems/1.9.1/gems/haml-3.0.0/lib/haml/exec.rb:22:in `parse!' C:/Ruby/lib/ruby/gems/1.9.1/gems/haml-3.0.0/bin/html2haml:7:in `<top (required)>' C:/Ruby/bin/html2haml:19:in `load' C:/Ruby/bin/html2haml:19:in `<main>' [NOTE] You may encounter a bug of Ruby interpreter. Bug reports are welcome. For details: http://www.ruby-lang.org/bugreport.html This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. And then ruby crashes. I have reinstalled all the gems but nothing will help. Please help me

    Read the article

  • Missing functions in ruby 1.8

    - by Adrian
    I have a ruby gem that I developed with ruby 1.9, and it works. With ruby 1.8, though, it says this when I try to run it: dyld: lazy symbol binding failed: Symbol not found: _RBIGNUM_SIGN Referenced from: /Users/Adrian/Desktop/num_to_bytes/ext/num_to_bytes/num_to_bytes.bundle Expected in: flat namespace dyld: Symbol not found: _RBIGNUM_SIGN Referenced from: /Users/Adrian/Desktop/num_to_bytes/ext/num_to_bytes/num_to_bytes.bundle Expected in: flat namespace Trace/BPT trap If I comment out the line that uses RBIGNUM_SIGN, it complains about other functions like rb_big_modulo. Some things work, like NUM2LONG. Here are some things I have tried: In http://github.com/ruby/ruby/blob/ruby_1_8_7/ruby.h, RBIGNUM_SIGN is defined. But in all versions of ruby I have tried, it is not there. I guessed that maybe it was defined in a different .h file. Knowing that Hpricot works with 1.8, I looked at http://github.com/hpricot/hpricot/blob/master/ext/hpricot_scan/hpricot_scan.h. It doesn't include any other files that #define it. Putting things like extern VALUE rb_big_modulo(VALUE x); at the beginning of my extension don't help. Using a brand new Ubuntu installation, I apt-getted ruby, tried to install the gem, and it didn't work either. Putting have_library 'ruby', 'rb_big_modulo' in my extconf.rb didn't work. As you can probably see, I am getting desperate (after weeks of trying things!). So, how can I get this to work? Here is the gem: http://rubygems.org/gems/num_to_bytes Here is the source: http://gist.github.com/404584

    Read the article

  • Library to parse ERB files

    - by Douglas Sellers
    I am attempting to parse, not evaluate, rails ERB files in a Hpricot/Nokogiri type manner. The files I am attempting to parse contain HTML fragments intermixed with dynamic content generated using ERB (standard rails view files) I am looking for a library that will not only parse the surrounding content, much the way that Hpricot or Nokogiri will but will also treat the ERB symbols, <%, <%= etc, as though they were html/xml tags. Ideally I would get back a DOM like structure where the <%, <%= etc symbols would be included as their own node types. I know that it is possible to hack something together using regular expressions but I was looking for something a bit more reliable as I am developing a tool that I need to run on a very large view code base where both the html content and the erb content are important. For example, content such as: blah blah blah <divMy Great Text <%= my_dynamic_expression %</div Would return a tree structure like: root - text_node (blah blah blah) - element (div) - text_node (My Great Text ) - erb_node (<%=)

    Read the article

  • vestal_versions : problem with column named changes

    - by arkannia
    Hi, I am working with vestal version for 2 months. Everything was fine until this afternoon. I didn't done anything special(or i don't remembered...) but the code works fine on others computers... The problem is that i'm not able to save my model anymore: rails give me this error : ActiveRecord::DangerousAttributeError: changes is defined by ActiveRecord changes field is by default an activerecord method. With the console, the message is the next : ActiveRecord::DangerousAttributeError: changes is defined by ActiveRecord Here are my local gem files: abstract (1.0.0) actionmailer (3.0.0.beta3) actionpack (3.0.0.beta3) activemodel (3.0.0.beta3) activerecord (3.0.0.beta3) activeresource (3.0.0.beta3) activesupport (3.0.0.beta3) arel (0.3.3) builder (2.1.2) bundler (0.9.25, 0.9.24) crack (0.1.7) erubis (2.6.5) god (0.9.0) haml (3.0.1, 2.2.23) i18n (0.3.7) mail (2.2.0) memcache-client (1.8.3) memcached (0.17.7) mime-types (1.16) polyglot (0.3.1) rack (1.1.0) rack-mount (0.6.3) rack-test (0.5.3) rails (3.0.0.beta3) railties (3.0.0.beta3) rake (0.8.7) savon (0.7.8, 0.7.6) text-format (1.0.0) text-hyphen (1.0.0) thor (0.13.6, 0.13.4) treetop (1.4.5) tzinfo (0.3.20) And here my Gemfile source 'http://gemcutter.org' gem "rails", "3.0.0.beta3" gem "will_paginate", "3.0.pre" #gem 'nokogiri' #gem 'curb' #gem 'handsoap' gem 'savon' gem 'mysql' gem 'haml', '2.2.23' #gem 'haml', '3.0.1' gem 'hpricot' gem 'i18n', '> 0.3.5' gem 'i18n_routing' gem 'i18n_auto_scoping' gem 'handler301', :git => 'http://github.com/kwi/handler301.git' gem 'seo_meta_builder' gem 'vestal_versions' #gem 'paperclip', :git => 'git://github.com/thoughtbot/paperclip.git', :branch => 'rails3' ## Bundle edge rails: gem "rails", :git => "git://github.com/rails/rails.git" ## Bundle the gems you use: # gem "bj" # gem "hpricot", "0.6" # gem "sqlite3-ruby", :require => "sqlite3" # gem "aws-s3", :require => "aws/s3" ## Bundle gems used only in certain environments: # gem "rspec", :group => :test # group :test do # gem "webrat" # end If you have any suggestions to solve this issue, i'll be glad to hear them ! Thanks

    Read the article

  • regular expression for emails NOT ending with replace script

    - by corroded
    I'm currently modifying my regex for this: http://stackoverflow.com/questions/2782031/extracting-email-addresses-in-an-html-block-in-ruby-rails basically, im making another obfuscator that uses ROT13 by parsing a block of text for all links that contain a mailto referrer(using hpricot). One use case this doesn't catch is that if the user just typed in an email address(without turning it into a link via tinymce) So here's the basic flow of my method: 1. parse a block of text for all tags with href="mailto:..." 2. replace each tag with a javascript function that changes this into ROT13 (using this script: http://unixmonkey.net/?p=20) 3. once all links are obfuscated, pass the resulting block of text into another function that parses for all emails(this one has an email regex that reverses the email address and then adds a span to that email - to reverse it back) step 3 is supposed to clean the block of text for remaining emails that AREN'T in a href tags(meaning it wasn't parsed by hpricot). Problem with this is that the emails that were converted to ROT13 are still found by my regex. What i want to catch are just emails that WEREN'T CONVERTED to ROT13. How do i do this? well all emails the WERE CONVERTED have a trailing "'.replace" in them. meaning, i need to get all emails WITHOUT that string. so far i have this regex: /\b([A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}('.replace))\b/i but this gets all the emails with the trailing '.replace i want to get the opposite and I'm currently stumped with this. any help from regex gurus out there? MORE INFO: Here's the regex + the block of text im parsing: http://www.rubular.com/r/NqXIHrNqjI as you can see, the first two 'email addresses' are already obfuscated using ROT13. I need a regex that gets the emails [email protected] and [email protected]

    Read the article

  • gem install permission problem

    - by qichunren
    qichunren@zhaobak:~ gem install hpricot ERROR: While executing gem ... (Gem::FilePermissionError) You don't have write permissions into the /opt/ruby-enterprise-1.8.7/lib/ruby/gems/1.8 directory. current ligin user is qichunren, and qichunre user have write permission with .gem dir.I would like to know why gem not install files into my home .gem dir first?????Why my gem common first want to install files into /opt/ruby-enterprise-1.8.7/lib/ruby/gems/1.8

    Read the article

  • Using regular expressions to remove relative path slashes

    - by Adam Carlile
    Hey Guys I am trying to remove all the relative image path slashes from a chunk of HTML that contains several other elements. For example <img src="../../../../images/upload/1/test.jpg /> would need to become <img src="http://s3.amazonaws.com/website/images/upload/1/test.jpg" /> I was thinking of writing this as a rails helper, and just passing the entire block into the method, and make using Nokogiri or Hpricot to parse the HTML instead, but I don't really know. Any help would be great Cheers Adam

    Read the article

  • Fast ruby http library for large XML downloads

    - by Vlad Zloteanu
    I am consuming various XML-over-HTTP web services returning large XML files ( 2MB). What would be the fastest ruby http library to reduce the 'downloading' time? Required features: both GET and POST requests gzip/deflate downloads (Accept-Encoding: deflate, gzip) - very important I am thinking between: open-uri Net::HTTP curb but you can also come with other suggestions. P.S. To parse the response, I am using a pull parser from Nokogiri, so I don't need an integrated solution like rest-client or hpricot.

    Read the article

1 2  | Next Page >