Java: Match tokens between two strings and return the number of matched tokens

Posted by Cryssie on Stack Overflow See other posts from Stack Overflow or by Cryssie
Published on 2012-09-04T09:36:37Z Indexed on 2012/09/04 9:37 UTC
Read the original article Hit count: 312

Filed under:
|
|

Need some help to find the number of matched tokens between two strings. I have a list of string stored in ArrayList (example given below):

Line 0 : WRB VBD NN VB IN CC RB VBP NNP
Line 1 : WDT NNS VBD DT NN NNP NNP
Line 2 : WRB MD PRP VB DT NN IN NNS POS JJ NNS
Line 3 : WDT NN VBZ DT NN IN DT JJ NN IN DT NNP
Line 4 : WP VBZ DT JJ NN IN NN

Here, you can see each string consists of a bunch of tokens separated by spaces. So, there's three things I need to work with..

  1. Compare the first token (WRB) in Line 0 to the tokens in Line 1 to see if they match. Move on to the next tokens in Line 0 until a match is found. If there's a match, mark the matched tokens in Line 1 so that it will not be matched again.
  2. Return the number of matched tokens between Line 0 and Line 1.
  3. Return the distance of the matched tokens. Example: token NN is found on position 3 on line 0 and position 5 on Line 1. Distance = |3-5| = 2

I've tried using split string and store it to String[] but String[] is fixed and doesn't allow shrinking or adding of new elements. Tried Pattern Matcher but with disasterous results. Tried a few other methods but there's some problems with my nested for loops..(will post part of my coding if it will help).

Any advice or pointers on how to solve this problem this would be very much appreciated. Thank you very much.

© Stack Overflow or respective owner

Related posts about java

Related posts about string