Search Results

Search found 6 results on 1 pages for 'jericho'.

Page 1/1 | 1 

  • need help working with the Jericho Html Parser

    - by rookie
    Hi all I've simply used the following program on the url below http://jericho.htmlparser.net/samples/console/src/ExtractText.java My goal is to be able to extract the main body text, to be able to summarize it and present the summarized text as output to the user. My problem is that, I'm not sure how I'd modify the above program to only get the required text from the webpage, without the links or any other information. Again, I'd really appreciate any help I could get. Thanks in advance

    Read the article

  • Retrieving well formed HTML using Jericho HTML parser in Java

    - by Raj
    Hello, I've looked at jTidy for converting a snipped of malformed/real-world HTML into well-formed HTML/XHTML. However, there's a bug in the latest version due to which I'm not able to use it. I'm looking at Jericho since it has a lot of positive reviews around the net. However, its not immediately obvious to me how one would go about implementing a method like: public String getValidHTML(String messedUpHTML) For instance, if it was passed <div>bar, it would return <div>bar</div> Any pointers would be helpful. Thanks in advance!

    Read the article

  • Problem migrating membershipProvider functionality from mvc1 to mvc2

    - by Jericho
    I am migrating a web app in mvc1 to mvc2. When it came down to migrating my MembershipProvider authentication I keep getting errors that MembershipProvider and MembershipCreateStatus type cannot be found. I do have the reference to System.Web which to my understanding includes the Security reference, but when I examine the the object, those types do not appear. I am just getting familiar with mvc2, if anyone has any input on this it would be extremely appreciated.

    Read the article

  • Text extraction with java html parsers

    - by zenmonkey
    I want to use an html parser that does the following in a nice, elegant way Extract text (this is most important) Extract links, meta keywords Reconstruct original doc (optional but nice feature to have) From my investigation so far jericho seems to fit. Any other open source libraries you guys would recommend?

    Read the article

  • Java library for HTML analysis

    - by Raj
    Hi, (I've seen similar questions, but I think none of them cater to my specific needs, hence...) I would like to know if there is a Java library for analysis of real-world (read: incomplete, ill-formed) HTML. By analysis, I mean things like: figuring out the most prominent color in an HTML chunk changing that color to some other color (hence, has to support modification of the HTML as well) pruning out unwanted tags fixing up the HTML to result in a well formed HTML snippet Parts of the last two are done by libraries such as Jericho, and jTidy. 'Plugins' on top of these would be great. Thanks in advance!

    Read the article

1