New Regular Expression Features in Java 8

Posted by Jan Goyvaerts on Regular-Expressions.info See other posts from Regular-Expressions.info or by Jan Goyvaerts
Published on Fri, 16 May 2014 06:47:19 +0000 Indexed on 2014/05/26 21:28 UTC
Read the original article Hit count: 461

Filed under:

Java 8 brings a few changes to Java’s regular expression syntax to make it more consistent with Perl 5.14 and later in matching horizontal and vertical whitespace.

\h is a new feature. It is a shorthand character class that matches any horizontal whitespace character as defined in the Unicode standard.

In Java 4 to 7 \v is a character escape that matches only the vertical tab character. In Java 8 \v is a shorthand character class that matches any vertical whitespace, including the vertical tab. When upgrading to Java 8, make sure that any regexes that use \v still do what you want. Use \x0B or \cK to match just the vertical tab in any version of Java.

\R is also a new feature. It matches any line break as defined by the Unicode standard. Windows-style CRLF pairs are always matched as a whole. So \R matches \r\n while \R\R fails to match \r\n. \R is equivalent to (?>\r\n|[\n\cK\f\r\u0085\u2028\u2029]) with an atomic group that prevents it from matching only the CR in a CRLF pair. Oracle’s documentation for the Pattern class omits the atomic group when explaining \R, which is incorrect. You cannot use \R inside a character class.

RegexBuddy and RegexMagic have been updated to support Java 8. Java 4, 5, 6, and 7 are still supported. When you upgrade to Java 8 you can compare or convert your regular expressions between Java 8 and the Java version you were using previously.

© Regular-Expressions.info or respective owner

Related posts about Uncategorized