Search Results

Search found 2 results on 1 pages for 'ikmac'.

Page 1/1 | 1 

  • Does JAXP natively parse HTML?

    - by ikmac
    So, I whip up a quick test case in Java 7 to grab a couple of elements from random URIs, and see if the built-in parsing stuff will do what I need. Here's the basic setup (with exception handling etc omitted): DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance(); DocumentBuilder dbuild = dbfac.newDocumentBuilder(); Document doc = dbuild.parse("uri-goes-here"); With no error handler installed, the parse method throws exceptions on fatal parse errors. When getting the standard Apache 2.2 directory index page from a local server: a SAXParseException with the message White spaces are required between publicId and systemId. The doctype looks ok to me, whitespace and all. When getting a page off a Drupal 7 generated site, it never finishes. The parse method seems to hang. No exceptions thrown, never returns. When getting http://www.oracle.com, a SAXParseException with the message The element type "meta" must be terminated by the matching end-tag "</meta>". So it would appear that the default setup I've used here doesn't handle HTML, only strictly written XML. My question is: can JAXP be used out-of-the-box from openJDK 7 to parse HTML from the wild (without insane gesticulations), or am I better off looking for an HTML 5 parser? PS this is for something I may not open-source, so licensing is also an issue :(

    Read the article

  • What is the meaning of # in R5RS Scheme number literals

    - by ikmac
    There is a partial answer on Stack Overflow, but I'm asking something a teeny bit more specific than the answers there. So... Does the formal semantics (Section 7.2) specify the meaning of such a numeric literal? Does it specify the meaning of numeric operations on the value resulting from interpreting the literal? If yes, what are the meanings (in English -- denotational semantics is all greek characters to me :))?

    Read the article

1