Attempting to extract a pattern within a string

Posted by Brian on Stack Overflow See other posts from Stack Overflow or by Brian
Published on 2010-06-05T16:52:55Z Indexed on 2010/06/05 17:22 UTC
Read the original article Hit count: 673

Filed under:

pattern

I'm attempting to extract a given pattern within a text file, however, the results are not 100% what I want.

Here's my code:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class ParseText1 {

public static void main(String[] args) {

    String content = "<p>Yada yada yada <code> foo ddd</code>yada yada ...\n"
        + "more here <2004-08-24> bar<Bob Joe> etc etc\n"
        + "more here again <2004-09-24> bar<Bob Joe> <Fred Kej> etc etc\n"
        + "more here again <2004-08-24> bar<Bob Joe><Fred Kej> etc etc\n"
        + "and still more <2004-08-21><2004-08-21> baz <John Doe> and now <code>the end</code> </p>\n";

    Pattern p = Pattern
    .compile("<[1234567890]{4}-[1234567890]{2}-[1234567890]{2}>.*?<[^%0-9/]*>",
            Pattern.MULTILINE);

    Matcher m = p.matcher(content);

    // print all the matches that we find
    while (m.find()) {

        System.out.println(m.group());

    }

}
}

The output I'm getting is:

<2004-08-24> bar<Bob Joe>
<2004-09-24> bar<Bob Joe> <Fred Kej>
<2004-08-24> bar<Bob Joe><Fred Kej>
<2004-08-21><2004-08-21> baz <John Doe> and now <code>

The output I want is:

<2004-08-24> bar<Bob Joe>
<2004-08-24> bar<Bob Joe>
<2004-08-24> bar<Bob Joe>
<2004-08-21> baz <John Doe>

In short, the sequence of "date", "text (or blank)", and "name" must be extracted. Everything else should be avoided. For example the tag "Fred Kej" did not have any "date" tag before it, therefore, it should be flagged as invalid.

Also, as a side question, is there a way to store or track the text snippets that were skipped/rejected as were the valid texts.

Thanks, Brian

Developer IT

Attempting to extract a pattern within a string - Developer IT

Attempting to extract a pattern within a string

java

regex

string-manipulation

pattern

Related posts about java

Tomcat 6: Access Control Exception?

Problem in creation MDB Queue connection at Jboss StartUp

failing to establish connection between Postgres db and gwt

failing to establish connection between postgre db and gwt

Migration and deployement problems JBoss 4.2.2.GA to JBoss 6.0.0.M2

Related posts about regex

Find multiple regex in each line and skip result if one of the regex doesn't match

OWASP Regex Repository: Is this regex correct?

Make a Perl-style regex interpreter behave like a basic or extended regex interpreter

JS regex isn't matching, even thought it works with a regex tester

c# RegEx with "|"

Categories cloud