Search Results

Search found 14 results on 1 pages for 'htmltidy'.

Page 1/1 | 1 

  • Proper usage of JTidy to purify HTML

    - by Raj
    Hello, I am trying to use JTidy (jtidy-r938.jar) to sanitize an input HTML string, but I seem to have problems getting the default settings right. Often strings such as "hello world" end up as "helloworld" after tidying. I wanted to show what I'm doing here, and any pointers would be really appreciated: Assume that rawHtml is the String containing the input (real world) HTML. This is what I'm doing: InputStream is = new ByteArrayInputStream(rawHtml.getBytes("UTF-8")); Tidy tidy = new Tidy(); tidy.setQuiet(true); tidy.setShowWarnings(false); tidy.setXHTML(true); ByteArrayOutputStream baos = new ByteArrayOutputStream(); tidy.parseDOM(is, baos); String pure = baos.toString(); First off, does anything look fundamentally wrong with the above code? I seem to be getting weird results with this. Thanks in advance!

    Read the article

  • Managed (.net) library with html-tidy like functionality?

    - by Eamon Nerbonne
    Does anybody know of an html cleaner for .NET that can parse html and (for instance) convert it to a more machine friendly format such as xhtml? I've tried the HTML Agility Pack, but that fails to correctly parse even fairly simple examples. To give an example of html that should be parsed correctly: <html><body> <ul><li>TestElem1 <li>TestElem2 <li>TestElem3 List: <ul><li>Nested1 <li>Nested2</li> <li>Nested3 </ul> <li>TestElem4 </ul> <p>paragraph 1 <p>paragraph 2 <p>paragraph 3 </body></html> li tags don't need to be closed (see spec), and neither do P tags. In other words, the above sample should be parsed as: <html><body> <ul><li>TestElem1</li> <li>TestElem2</li> <li>TestElem3 List: <ul><li>Nested1</li> <li>Nested2</li> <li>Nested3</li> </ul></li> <li>TestElem4</li> </ul> <p>paragraph 1</p> <p>paragraph 2</p> <p>paragraph 3</p> </body></html> Since the aim is to use the library on various machines, it's a big disadvantage to need to fall back to native code (such as a wrapper around html tidy) which would require extra deployment hassle and sacrifice platform independance. Any suggestions? To recap, I'm looking for: An html cleaner ala HTML tidy Must be able to deal with real world html, not just xhtml, at the very least correctly reading valid html 4 Must be able to convert to a more easily processable xml format Should be a purely managed app.

    Read the article

  • How can we write the html tidy coding to insert the closing tag ?

    - by Harikrishna
    How can we write html tidy coding only for inserting closing tag in the html file where closing tags are missing ? I am parsing html tabular information using Html Agilitiy Pack. But where the ending tags are missing extracting information with html agility pack are not performed well. And if we write the ending tags manually and then we can extract the information perfectly with html agility pack.So I want to insert the closing tags where they are missing so html agility pack extracts the information perfectly.

    Read the article

  • Which is the best HTML tidy pack? Is there any option in HTML agility pack to make HTML webpage tidy

    - by Harikrishna
    I am using html agility pack to parse html tabular information. Now there is some html content with missing ending tags and from such page because of missing ending tags html agility pack does not parse information properly.So I want to insert ending tags where there are missing ending tags so html agility pack parse information properly. So to insert the missing ending tags what should I do ?Should I do write my own code for that or use html tidy pack to do that ? If html tidy pack then which is the best html tidy pack,and how to use it any example if possible ? And if my own code than what it can be like ? Is there any option in html agility pack which can make us able to first make the html page tidy and then parse the webpage.

    Read the article

  • Html tidy pack replaces `&lt` instead of `< `and `&gt` instead of `>` for html tag.

    - by Harikrishna
    I am tidying the html page by html tidy pack because there are some pages with missing ending tags So I can add the missing closing tags by html agility pack. But some times when I use the html tidy pack then that web page has changes like : If original table is <table><tr><td>first row</td></table> After using html tidy pack the table is : <table> is now &lt table &gt. But I want like <table>. In this particular example it works perfectly but in some html page it replace &lt instead of <and &gt instead of >.

    Read the article

  • Notepad++ Reindent XML breaks lines.

    - by C. Ross
    I'm attempting to use Notepad++ TextFX HTMLTidy - Tidy: Reindent XML. This works well with the following exception. Lines longer than 70 character are wrapped, breaking the validity of my xml. This: <RevieweeDepartmentName>BB AAAAAAAAAA AAAAAAAAAA AAAA</RevieweeDepartmentName> Becomes this: <RevieweeDepartmentName>BB AAAAAAAAAA AAAAAAAAAA AAAA</RevieweeDepartmentName> How can I get it to stop this behavior?

    Read the article

  • HTML Tidy for NetBeans IDE 7.4

    - by Geertjan
    The NetBeans HTML5 editor is pretty amazing, working on an extensive screencast on that right now, to be published soon. One thing missing is HTML Tidy integration, until now: As you can see, in this particular file, HTML Tidy finds 6 times more problems (OK, some of them maybe false negatives) than the standard NetBeans HTML hint infrastructure does. You can also run the scanner across the whole project or all projects. Only HTML files will be scanned by HTML Tidy (via JTidy) and you can click on items in the window above to jump to the line. Future enhancements will include error annotations and hint integration, some of which has already been addressed in this blog over the years. Download it from here: http://plugins.netbeans.org/plugin/51066/?show=true Sources are here. Contributions more than welcome: https://java.net/projects/nb-api-samples/sources/api-samples/show/versions/7.4/misc/HTMLTidy

    Read the article

  • Command line tool to query HTML elements (linux)

    - by ipsec
    I am looking for a (linux) command line tool to parse HTML files and extract some elements, ideally with some XPath-like syntax. I have the following requirements: It must be able to parse arbitrary HTML files (which may contain errors) in a robust manner It must be able to extract text of elements and attributes What I have tried so far: xmlstarlet: would be perfect, but mostly reports errors in files (e.g. entity not defined), even xml fo or htmltidy does not help. xmllint: the best I have found so far, but is not able to extract attribute texts. Something like //a/@href reports <a href="foo">, what I need is just foo. string(//a/@href) works, but queries only the first entry. data is not supported. hxextract: works, but cannot extract attributes. XQilla: would support XPath 2.0 and thus data. It also support xqilla:parse-html, but I have had no luck making this work. Can you recommend me another tool?

    Read the article

  • Why is XHTML1.1 dated *before* XHTML1.0 ? What is the preferred XHTML today?

    - by Cheeso
    I'm not clear on the status of XHTML - v1.0 and v1.1. Can someone explain which is preferred at this point, and why? The specs from w3c say that XHTML 1.1 predates* XHTML 1.0, which is exactly counter-intuitive, to me. http://www.w3.org/TR/xhtml11/ - W3C Recommendation 31 May 2001 http://www.w3.org/TR/xhtml1/ - W3C Recommentation, updated 1 August 2002 Also, I noted earlier today that the latest version of htmltidy emits XHTML 1.0, when I request xhtml. Hmmm....Even though the XHTML 1.1 spec is 9 years old, it's still not supported by mainstream tools. That suggests that XHTML 1.1 is either completely unnecessary or spurious. Which one should I use if I am authoring pages today? What if I am building tools - should I bother to support both? Or do I need only one? Thanks.

    Read the article

  • HTML Tidy in NetBeans IDE (Part 2)

    - by Geertjan
    This is what I was aiming for in the previous blog entry: What you can see above (especially if you click to enlarge it) is that I have HTML Tidy integrated into the NetBeans analyzer functionality, which is pluggable from 7.2 onwards. Well, if you set an implementation dependency on "Static Analysis Core", since it's not an official API yet. Also, the scopes of the analyzer functionality are not pluggable. That means you can 'only' set the analyzer's scope to one or more projects, one or more packages, or one or more files. Not one or more folders, which means you can't have a bunch off HTML files in a folder that you access via the Favorites window and then run the analyzer on that folder (or on multiple folders). Thus, to try out my new code, I had to put some HTML files into a package inside a Java application. Then I chose that package as the scope of the analyzer. Then I ran all the analyzers (i.e., standard NetBeans Java hints, FindBugs, as well as my HTML Tidy extension) on that package. The screenshot above is the result. Here's all the code for the above, which is a port of the Action code from the previous blog entry into a new Analyzer implementation: import java.io.IOException; import java.io.PrintWriter; import java.io.StringWriter; import java.util.ArrayList; import java.util.Collections; import java.util.List; import javax.swing.JComponent; import javax.swing.text.Document; import org.netbeans.api.fileinfo.NonRecursiveFolder; import org.netbeans.modules.analysis.spi.Analyzer; import org.netbeans.modules.analysis.spi.Analyzer.AnalyzerFactory; import org.netbeans.modules.analysis.spi.Analyzer.Context; import org.netbeans.modules.analysis.spi.Analyzer.CustomizerProvider; import org.netbeans.modules.analysis.spi.Analyzer.WarningDescription; import org.netbeans.spi.editor.hints.ErrorDescription; import org.netbeans.spi.editor.hints.ErrorDescriptionFactory; import org.netbeans.spi.editor.hints.Severity; import org.openide.cookies.EditorCookie; import org.openide.filesystems.FileObject; import org.openide.loaders.DataObject; import org.openide.util.Exceptions; import org.openide.util.lookup.ServiceProvider; import org.w3c.tidy.Tidy; public class TidyAnalyzer implements Analyzer {     private final Context ctx;     private TidyAnalyzer(Context cntxt) {         this.ctx = cntxt;     }     @Override     public Iterable<? extends ErrorDescription> analyze() {         List<ErrorDescription> result = new ArrayList<ErrorDescription>();         for (NonRecursiveFolder sr : ctx.getScope().getFolders()) {             FileObject folder = sr.getFolder();             for (FileObject fo : folder.getChildren()) {                 for (ErrorDescription ed : doRunHTMLTidy(fo)) {                     if (fo.getMIMEType().equals("text/html")) {                         result.add(ed);                     }                 }             }         }         return result;     }     private List<ErrorDescription> doRunHTMLTidy(FileObject sr) {         final List<ErrorDescription> result = new ArrayList<ErrorDescription>();         Tidy tidy = new Tidy();         StringWriter stringWriter = new StringWriter();         PrintWriter errorWriter = new PrintWriter(stringWriter);         tidy.setErrout(errorWriter);         try {             Document doc = DataObject.find(sr).getLookup().lookup(EditorCookie.class).openDocument();             tidy.parse(sr.getInputStream(), System.out);             String[] split = stringWriter.toString().split("\n");             for (String string : split) {                 //Bit of ugly string parsing coming up:                 if (string.startsWith("line")) {                     final int end = string.indexOf(" c");                     int lineNumber = Integer.parseInt(string.substring(0, end).replace("line ", ""));                     string = string.substring(string.indexOf(": ")).replace(":", "");                     result.add(ErrorDescriptionFactory.createErrorDescription(                             Severity.WARNING,                             string,                             doc,                             lineNumber));                 }             }         } catch (IOException ex) {             Exceptions.printStackTrace(ex);         }         return result;     }     @Override     public boolean cancel() {         return true;     }     @ServiceProvider(service = AnalyzerFactory.class)     public static final class MyAnalyzerFactory extends AnalyzerFactory {         public MyAnalyzerFactory() {             super("htmltidy", "HTML Tidy", "org/jtidy/format_misc.gif");         }         public Iterable<? extends WarningDescription> getWarnings() {             return Collections.EMPTY_LIST;         }         @Override         public <D, C extends JComponent> CustomizerProvider<D, C> getCustomizerProvider() {             return null;         }         @Override         public Analyzer createAnalyzer(Context cntxt) {             return new TidyAnalyzer(cntxt);         }     } } The above only works on packages, not on projects and not on individual files.

    Read the article

1