Search Results

Search found 14 results on 1 pages for 'jtidy'.

Page 1/1 | 1 

  • Proper usage of JTidy to purify HTML

    - by Raj
    Hello, I am trying to use JTidy (jtidy-r938.jar) to sanitize an input HTML string, but I seem to have problems getting the default settings right. Often strings such as "hello world" end up as "helloworld" after tidying. I wanted to show what I'm doing here, and any pointers would be really appreciated: Assume that rawHtml is the String containing the input (real world) HTML. This is what I'm doing: InputStream is = new ByteArrayInputStream(rawHtml.getBytes("UTF-8")); Tidy tidy = new Tidy(); tidy.setQuiet(true); tidy.setShowWarnings(false); tidy.setXHTML(true); ByteArrayOutputStream baos = new ByteArrayOutputStream(); tidy.parseDOM(is, baos); String pure = baos.toString(); First off, does anything look fundamentally wrong with the above code? I seem to be getting weird results with this. Thanks in advance!

    Read the article

  • How can I get the error/warning messages out of the parsed HTML using JTidy?

    - by chetu
    I am able to parse the HTML but I want to extract the warning messages from the parsed HTML and show them to the user. Here is my code: Tidy tidy = new Tidy(); StringBuffer StringBuffer1 = new StringBuffer("<b>Hello<u><b>I am tsting another one.....<i>another....."); InputStream in = new ByteArrayInputStream(StringBuffer1.toString().getBytes("UTF-8")); Writer stringWriter = new StringWriter(); tidy.setPrintBodyOnly(true); tidy.setQuiet(true); tidy.setShowWarnings(true); tidy.setTidyMark(false); tidy.setXHTML(true); tidy.setXmlTags(false); Node parsedNode = tidy.parse(in, stringWriter); System.out.print(stringWriter.toString());

    Read the article

  • HTML Tidy for NetBeans IDE 7.4

    - by Geertjan
    The NetBeans HTML5 editor is pretty amazing, working on an extensive screencast on that right now, to be published soon. One thing missing is HTML Tidy integration, until now: As you can see, in this particular file, HTML Tidy finds 6 times more problems (OK, some of them maybe false negatives) than the standard NetBeans HTML hint infrastructure does. You can also run the scanner across the whole project or all projects. Only HTML files will be scanned by HTML Tidy (via JTidy) and you can click on items in the window above to jump to the line. Future enhancements will include error annotations and hint integration, some of which has already been addressed in this blog over the years. Download it from here: http://plugins.netbeans.org/plugin/51066/?show=true Sources are here. Contributions more than welcome: https://java.net/projects/nb-api-samples/sources/api-samples/show/versions/7.4/misc/HTMLTidy

    Read the article

  • Java library for CSS cleanup

    - by ndn
    For a rich text editor that has to handle pasted HTML code from MS Office applications, I'm looking for a Java library that cleans up the content of all "style" attributes in HTML elements, so that only some CSS attributes are left: background-color border color font-family font-weight font-style list-style-type text-align text-decoration vertical-align For creating a well-formed HTML document, I can use JTidy. For HTML element transformations (removing unwanted elements), I can use http://htmlparser.sourceforge.net/ Is there anything comparable for CSS attributes?

    Read the article

  • Html code clearner

    - by Blanca
    Hi! Is there any library or method to input a String with html code, and which has a return value another String whitout this htmlo code, just the information??? I am watching libraries such JTidy, or HtmlParser, but I don't know how to use it! Something easier??? Thank you!

    Read the article

  • HTML Tidy in NetBeans IDE

    - by Geertjan
    First step in integrating HTML Tidy (via its JTidy implementation) into NetBeans IDE: The reason why I started doing this is because I want to integrate this into the pluggable analyzer functionality of NetBeans IDE that I recently blogged about, i.e., where the FindBugs functionality is found. So a logical first step is to get it working in an Action class, after which I can port it into the analyzer infrastructure: import java.awt.event.ActionEvent; import java.awt.event.ActionListener; import java.io.IOException; import java.io.PrintWriter; import java.io.StringWriter; import org.openide.awt.ActionID; import org.openide.awt.ActionReference; import org.openide.awt.ActionReferences; import org.openide.awt.ActionRegistration; import org.openide.cookies.EditorCookie; import org.openide.cookies.LineCookie; import org.openide.loaders.DataObject; import org.openide.text.Line; import org.openide.text.Line.ShowOpenType; import org.openide.util.Exceptions; import org.openide.util.NbBundle.Messages; import org.openide.windows.IOProvider; import org.openide.windows.InputOutput; import org.openide.windows.OutputEvent; import org.openide.windows.OutputListener; import org.openide.windows.OutputWriter; import org.w3c.tidy.Tidy; @ActionID(     category = "Tools", id = "org.jtidy.TidyAction") @ActionRegistration(     displayName = "#CTL_TidyAction") @ActionReferences({     @ActionReference(path = "Loaders/text/html/Actions", position = 150),     @ActionReference(path = "Editors/text/html/Popup", position = 750) }) @Messages("CTL_TidyAction=Run HTML Tidy") public final class TidyAction implements ActionListener {     private final DataObject context;     private final OutputWriter writer;     private EditorCookie ec = null;     public TidyAction(DataObject context) {         this.context = context;         ec = context.getLookup().lookup(org.openide.cookies.EditorCookie.class);         InputOutput io = IOProvider.getDefault().getIO("HTML Tidy", false);         io.select();         writer = io.getOut();     }     @Override     public void actionPerformed(ActionEvent ev) {         Tidy tidy = new Tidy();         try {             writer.reset();             StringWriter stringWriter = new StringWriter();             PrintWriter errorWriter = new PrintWriter(stringWriter);             tidy.setErrout(errorWriter);             tidy.parse(context.getPrimaryFile().getInputStream(), System.out);             String[] split = stringWriter.toString().split("\n");             for (final String string : split) {                 final int end = string.indexOf(" c");                 if (string.startsWith("line")) {                     writer.println(string, new OutputListener() {                         @Override                         public void outputLineAction(OutputEvent oe) {                             LineCookie lc = context.getLookup().lookup(LineCookie.class);                             int lineNumber = Integer.parseInt(string.substring(0, end).replace("line ", ""));                             Line line = lc.getLineSet().getOriginal(lineNumber - 1);                             line.show(ShowOpenType.OPEN, Line.ShowVisibilityType.FOCUS);                         }                         @Override                         public void outputLineSelected(OutputEvent oe) {}                         @Override                         public void outputLineCleared(OutputEvent oe) {}                     });                 }             }         } catch (IOException ex) {             Exceptions.printStackTrace(ex);         }     } } The string parsing above is ugly but gets the job done for now. A problem integrating this into the pluggable analyzer functionality is the limitation of its scope. The analyzer lets you select one or more projects, or individual files, but not a folder. So it doesn't work on folders in the Favorites window, for example, which is where I'd like to apply HTML Tidy, across multiple folders via the analyzer functionality. That's a bit of a bummer that I'm hoping to get around somehow.

    Read the article

  • Retrieving well formed HTML using Jericho HTML parser in Java

    - by Raj
    Hello, I've looked at jTidy for converting a snipped of malformed/real-world HTML into well-formed HTML/XHTML. However, there's a bug in the latest version due to which I'm not able to use it. I'm looking at Jericho since it has a lot of positive reviews around the net. However, its not immediately obvious to me how one would go about implementing a method like: public String getValidHTML(String messedUpHTML) For instance, if it was passed <div>bar, it would return <div>bar</div> Any pointers would be helpful. Thanks in advance!

    Read the article

  • Java library for HTML analysis

    - by Raj
    Hi, (I've seen similar questions, but I think none of them cater to my specific needs, hence...) I would like to know if there is a Java library for analysis of real-world (read: incomplete, ill-formed) HTML. By analysis, I mean things like: figuring out the most prominent color in an HTML chunk changing that color to some other color (hence, has to support modification of the HTML as well) pruning out unwanted tags fixing up the HTML to result in a well formed HTML snippet Parts of the last two are done by libraries such as Jericho, and jTidy. 'Plugins' on top of these would be great. Thanks in advance!

    Read the article

  • Ideal Java library for cleaning html, and escaping malformed fragments

    - by Tyler
    I've got some HTML files that need to be parsed and cleaned, and they occasionally have content with special characters like <, , ", etc. which have not been properly escaped. I have tried running the files through jTidy, but the best I can get it to do is just omit the content it sees as malformed html. Is there a different library that will just escape the malformed fragments instead of omitting them? If not, any recommendations on what library would be easiest to modify? Clarification: Sample input: <p> blah blah <M+1> blah </p> Desired output: <p> blah blah &lt;M+1&gt; blah </p>

    Read the article

  • utf-8 convertion doesn't work always

    - by Marco Piccinni
    I searched into other stack before to type here and I didn't find anythong similar. I have to scrape different utf-8 webpages which contain text like "Oggi è una bellissima giornata" the problem is on the characther "è" I extract this text with jtidy and xpath query expression and I convert it with byte[] content = filteredEncodedString.getBytes("utf-8"); String result = new String(content,"utf-8"); where filteredEncodedString contains the text "Oggi è una bellissima giornata". This procedures works on the most webpages analyzed so far but in some case it doesn't extract a utf-8 string. Page encoding is always the same as the text is similar. Any ideas about the problem? thanks Marco

    Read the article

  • Spring 3.0: Unable to locate Spring NamespaceHandler for XML schema namespace

    - by Nick Hristov
    My setup is fairly simple: I have a web front-end, back-end is spring-wired. I am using AOP to add a layer of security on my rpc services. It's all good, except for the fact that the web app aborts on launch: [java] SEVERE: Context initialization failed [java] org.springframework.beans.factory.parsing.BeanDefinitionParsingException: Configuration problem: Unable to locate Spring NamespaceHandler for XML schema namespace [http://www.springframework.org/schema/aop] [java] Offending resource: ServletContext resource [/WEB-INF/gwthandler-servlet.xml] Here is the snippet from my xml config file: <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:aop="http://www.springframework.org/schema/aop" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop.xsd"> <aop:config> <aop:aspect id="security" ref="securityAspect" > <aop:pointcut id="securedServices" expression="@annotation(com.fb.boog.common.aspects.Secured)"/> <aop:before method="checkSecurity" pointcut-ref="securedServices"/> </aop:aspect> </aop:config> I read over the internets that it may be my classloading the core of the problem. Doubtful, since here is my WEB-INF/lib directory: ./WEB-INF/lib ./WEB-INF/lib/aopalliance-alpha1.jar ./WEB-INF/lib/aspectj-1.6.6.jar ./WEB-INF/lib/commons-collections.jar ./WEB-INF/lib/commons-logging.jar ./WEB-INF/lib/ehcache-core-1.7.0.jar ./WEB-INF/lib/ejb3-persistence.jar ./WEB-INF/lib/hibernate ./WEB-INF/lib/hibernate/antlr.jar ./WEB-INF/lib/hibernate/asm.jar ./WEB-INF/lib/hibernate/bsh-2.0b1.jar ./WEB-INF/lib/hibernate/cglib.jar ./WEB-INF/lib/hibernate/dom4j.jar ./WEB-INF/lib/hibernate/freemarker.jar ./WEB-INF/lib/hibernate/hibernate-annotations.jar ./WEB-INF/lib/hibernate/hibernate-shards.jar ./WEB-INF/lib/hibernate/hibernate-tools.jar ./WEB-INF/lib/hibernate/hibernate.jar ./WEB-INF/lib/hibernate/jtidy-r8-20060801.jar ./WEB-INF/lib/jabsorb ./WEB-INF/lib/jabsorb/jabsorb-1.3.1.jar ./WEB-INF/lib/jta.jar ./WEB-INF/lib/jyaml-1.3.jar ./WEB-INF/lib/postgresql-8.4-701.jdbc4.jar ./WEB-INF/lib/sjsxp.jar ./WEB-INF/lib/spring ./WEB-INF/lib/spring/org.springframework.aop-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.asm-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.aspects-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.beans-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.context-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.context.support-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.core-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.expression-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.instrument-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.instrument.tomcat-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.jdbc-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.jms-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.orm-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.oxm-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.test-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.transaction-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.web-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.web.portlet-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.web.servlet-3.0.0.RELEASE.jar ./WEB-INF/lib/spring/org.springframework.web.struts-3.0.0.RELEASE.jar ./WEB-INF/lib/testng-5.11-jdk15.jar ./WEB-INF/web.xml

    Read the article

  • Why do I get the error "Only antlib URIs can be located from the URI alone,not the URI" when trying to run hibernate tools in my build.xml

    - by Casbah
    I'm trying to run hibernate tools in an ant build to generate ddl from my JPA annotations. Ant dies on the taskdef tag. I've tried with ant 1.7, 1.6.5, and 1.6 to no avail. I've tried both in eclipse and outside. I've tried including all the hbn jars in the hibernate-tools path and not. Note that I based my build file on this post: http://stackoverflow.com/questions/281890/hibernate-jpa-to-ddl-command-line-tools I'm running eclipse 3.4 with WTP 3.0.1 and MyEclipse 7.1 on Ubuntu 8. Build.xml: <project name="generateddl" default="generate-ddl"> <path id="hibernate-tools"> <pathelement location="../libraries/hibernate-tools/hibernate-tools.jar" /> <pathelement location="../libraries/hibernate-tools/bsh-2.0b1.jar" /> <pathelement location="../libraries/hibernate-tools/freemarker.jar" /> <pathelement location="../libraries/jtds/jtds-1.2.2.jar" /> <pathelement location="../libraries/hibernate-tools/jtidy-r8-20060801.jar" /> </path> <taskdef classname="org.hibernate.tool.ant.HibernateToolTask" classpathref="hibernate-tools"/> <target name="generate-ddl" description="Export schema to DDL file"> <!-- compile model classes before running hibernatetool --> <!-- task definition; project.class.path contains all necessary libs <taskdef name="hibernatetool" classname="org.hibernate.tool.ant.HibernateToolTask" classpathref="project.class.path" /> --> <hibernatetool destdir="sql"> <!-- check that directory exists --> <jpaconfiguration persistenceunit="default" /> <classpath> <dirset dir="WebRoot/WEB-INF/classes"> <include name="**/*"/> </dirset> </classpath> <hbm2ddl outputfilename="schemaexport.sql" format="true" export="false" drop="true" /> </hibernatetool> </target> Error message (ant -v): Apache Ant version 1.7.0 compiled on December 13 2006 Buildfile: /home/joe/workspace/bento/ant-generate-ddl.xml parsing buildfile /home/joe/workspace/bento/ant-generate-ddl.xml with URI = file:/home/joe/workspace/bento/ant-generate-ddl.xml Project base dir set to: /home/joe/workspace/bento [antlib:org.apache.tools.ant] Could not load definitions from resource org/apache/tools/ant/antlib.xml. It could not be found. BUILD FAILED /home/joe/workspace/bento/ant-generate-ddl.xml:12: Only antlib URIs can be located from the URI alone,not the URI at org.apache.tools.ant.taskdefs.Definer.execute(Definer.java:216) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.Target.execute(Target.java:357) at org.apache.tools.ant.helper.ProjectHelper2.parse(ProjectHelper2.java:140) at org.eclipse.ant.internal.ui.antsupport.InternalAntRunner.parseBuildFile(InternalAntRunner.java:191) at org.eclipse.ant.internal.ui.antsupport.InternalAntRunner.run(InternalAntRunner.java:400) at org.eclipse.ant.internal.ui.antsupport.InternalAntRunner.main(InternalAntRunner.java:137) Total time: 195 milliseconds

    Read the article

  • HTML Tidy in NetBeans IDE (Part 2)

    - by Geertjan
    This is what I was aiming for in the previous blog entry: What you can see above (especially if you click to enlarge it) is that I have HTML Tidy integrated into the NetBeans analyzer functionality, which is pluggable from 7.2 onwards. Well, if you set an implementation dependency on "Static Analysis Core", since it's not an official API yet. Also, the scopes of the analyzer functionality are not pluggable. That means you can 'only' set the analyzer's scope to one or more projects, one or more packages, or one or more files. Not one or more folders, which means you can't have a bunch off HTML files in a folder that you access via the Favorites window and then run the analyzer on that folder (or on multiple folders). Thus, to try out my new code, I had to put some HTML files into a package inside a Java application. Then I chose that package as the scope of the analyzer. Then I ran all the analyzers (i.e., standard NetBeans Java hints, FindBugs, as well as my HTML Tidy extension) on that package. The screenshot above is the result. Here's all the code for the above, which is a port of the Action code from the previous blog entry into a new Analyzer implementation: import java.io.IOException; import java.io.PrintWriter; import java.io.StringWriter; import java.util.ArrayList; import java.util.Collections; import java.util.List; import javax.swing.JComponent; import javax.swing.text.Document; import org.netbeans.api.fileinfo.NonRecursiveFolder; import org.netbeans.modules.analysis.spi.Analyzer; import org.netbeans.modules.analysis.spi.Analyzer.AnalyzerFactory; import org.netbeans.modules.analysis.spi.Analyzer.Context; import org.netbeans.modules.analysis.spi.Analyzer.CustomizerProvider; import org.netbeans.modules.analysis.spi.Analyzer.WarningDescription; import org.netbeans.spi.editor.hints.ErrorDescription; import org.netbeans.spi.editor.hints.ErrorDescriptionFactory; import org.netbeans.spi.editor.hints.Severity; import org.openide.cookies.EditorCookie; import org.openide.filesystems.FileObject; import org.openide.loaders.DataObject; import org.openide.util.Exceptions; import org.openide.util.lookup.ServiceProvider; import org.w3c.tidy.Tidy; public class TidyAnalyzer implements Analyzer {     private final Context ctx;     private TidyAnalyzer(Context cntxt) {         this.ctx = cntxt;     }     @Override     public Iterable<? extends ErrorDescription> analyze() {         List<ErrorDescription> result = new ArrayList<ErrorDescription>();         for (NonRecursiveFolder sr : ctx.getScope().getFolders()) {             FileObject folder = sr.getFolder();             for (FileObject fo : folder.getChildren()) {                 for (ErrorDescription ed : doRunHTMLTidy(fo)) {                     if (fo.getMIMEType().equals("text/html")) {                         result.add(ed);                     }                 }             }         }         return result;     }     private List<ErrorDescription> doRunHTMLTidy(FileObject sr) {         final List<ErrorDescription> result = new ArrayList<ErrorDescription>();         Tidy tidy = new Tidy();         StringWriter stringWriter = new StringWriter();         PrintWriter errorWriter = new PrintWriter(stringWriter);         tidy.setErrout(errorWriter);         try {             Document doc = DataObject.find(sr).getLookup().lookup(EditorCookie.class).openDocument();             tidy.parse(sr.getInputStream(), System.out);             String[] split = stringWriter.toString().split("\n");             for (String string : split) {                 //Bit of ugly string parsing coming up:                 if (string.startsWith("line")) {                     final int end = string.indexOf(" c");                     int lineNumber = Integer.parseInt(string.substring(0, end).replace("line ", ""));                     string = string.substring(string.indexOf(": ")).replace(":", "");                     result.add(ErrorDescriptionFactory.createErrorDescription(                             Severity.WARNING,                             string,                             doc,                             lineNumber));                 }             }         } catch (IOException ex) {             Exceptions.printStackTrace(ex);         }         return result;     }     @Override     public boolean cancel() {         return true;     }     @ServiceProvider(service = AnalyzerFactory.class)     public static final class MyAnalyzerFactory extends AnalyzerFactory {         public MyAnalyzerFactory() {             super("htmltidy", "HTML Tidy", "org/jtidy/format_misc.gif");         }         public Iterable<? extends WarningDescription> getWarnings() {             return Collections.EMPTY_LIST;         }         @Override         public <D, C extends JComponent> CustomizerProvider<D, C> getCustomizerProvider() {             return null;         }         @Override         public Analyzer createAnalyzer(Context cntxt) {             return new TidyAnalyzer(cntxt);         }     } } The above only works on packages, not on projects and not on individual files.

    Read the article

1