Trying to parse links in an HTML directory listing using Java regex

Posted by DiskCrasher on Stack Overflow See other posts from Stack Overflow or by DiskCrasher
Published on 2010-03-30T02:44:23Z Indexed on 2010/03/30 2:53 UTC
Read the original article Hit count: 609

Filed under:
|
|
|
|

Ok I know everyone is going to tell me not to use RegEx for parsing HTML, but I'm programming on Android and don't have ready access to an HTML parser (that I'm aware of). Besides, this is server generated HTML which should be more consistent than user-generated HTML.

The regex looks like this:

Pattern patternMP3 = Pattern.compile(
        "<A HREF=\"[^\"]+.+\\.mp3</A>",
        Pattern.CASE_INSENSITIVE |
        Pattern.UNICODE_CASE);
Matcher matcherMP3 = patternMP3.matcher(HTML);
while (matcherMP3.find()) { ... }

The input HTML is all on one line, which is causing the problem. When the HTML is on separate lines this pattern works. Any suggestions?

© Stack Overflow or respective owner

Related posts about java

Related posts about regex