implementing SRX Segmentation Rules in JavaScript

Posted by Sourabh on Stack Overflow See other posts from Stack Overflow or by Sourabh
Published on 2010-05-03T15:02:18Z Indexed on 2010/05/03 15:08 UTC
Read the original article Hit count: 364

Filed under:

Hello ,

I want to implement the SRX Segmentation Rules using javascript to extract sentences from text.

In order to do this correctly I will have to follow the SRX rules.

eg. http://www.lisa.org/fileadmin/standards/srx20.html#refTR29

now there are two types of regular expressions

  1. if found sentence should break like ". "
  2. if found sentence should not break like abbreviation U.K or Mr.

For this again there are two parts

  1. before breaking
  2. after breaking

for example if the rule is

<rule break="no">

    <beforebreak>\s*[0-9]+\.</beforebreak>
    <afterbreak>\s</afterbreak>

</rule>

Which says if the pattern "\s*[0-9]+.\s" is found the segment should not break.

how do I implement using javascript, my be split function is not enough ?

© Stack Overflow or respective owner

Related posts about JavaScript