Regular expression either/or not matching everything

Posted by dwatransit on Stack Overflow See other posts from Stack Overflow or by dwatransit
Published on 2010-04-27T18:09:41Z Indexed on 2010/04/27 18:13 UTC
Read the original article Hit count: 303

Filed under:

I'm trying to parse an HTTP GET request to determine if the url contains any of a number of file types. If it does, I want to capture the entire request. There is something I don't understand about ORing.

The following regular expression only captures part of it, and only if .flv is the first int the list of ORd values.

(I've obscured the urls with spaces because Stackoverflow limits hyperlinks)

regex: GET.?(.flv)|(.mp4)|(.avi).? test text: GET http: // foo.server.com/download/0/37/3000016511/.flv?mt=video/xy match output: GET http: // foo.server.com/download/0/37/3000016511/.flv

I don't understand why the .*? at the end of the regex isnt callowing it to capture the entire text. If I get rid of the ORing of file types, then it works.

Here is the test code in case my explanation doesn't make sense:

public static void main(String[] args) { // TODO Auto-generated method stub String sourcestring = "GET http: // foo.server.com/download/0/37/3000016511/.flv?mt=video/xy"; Pattern re = Pattern.compile("GET .?\.flv."); // this works //output: // [0][0] = GET http :// foo.server.com/download/0/37/3000016511/.flv?mt=video/xy

// the match from the following ends with the ".flv", not the entire url. // also it only works if .flv is the first of the 3 ORd options //Pattern re = Pattern.compile("GET .?(\.flv)|(\.mp4)|(\.avi).?"); // output: //[0][0] = GET http: // foo.server.com/download/0/37/3000016511/.flv // [0][1] = .flv // [0][2] = null // [0][3] = null

Matcher m = re.matcher(sourcestring);
int mIdx = 0;
  while (m.find()){
    for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
      System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
    }
    mIdx++;
  }

} }

© Stack Overflow or respective owner

Related posts about regex