Sanitize a string with non-alphanum repetition

Posted by Toto on Stack Overflow See other posts from Stack Overflow or by Toto
Published on 2010-03-28T10:07:32Z Indexed on 2010/03/28 11:23 UTC
Read the original article Hit count: 422

Filed under:
|

I need to sanitize article titles when (creative) users try to "attract attention" with some non-alphanum repetition.

Exemples:

  • Buy my product !!!!!!!!!!!!!!!!!!!!!!!!
  • Buy my product !? !? !? !? !? !?
  • Buy my product !!!!!!!!!.......!!!!!!!!
  • Buy my product <-----------

Some acceptable solution would be to reduce the repetition of non-alphanum to 2.

So I would get:

  • Buy my product !!
  • Buy my product !? !?
  • Buy my product !!..!!
  • Buy my product <--

This solution did not work that well:

preg_replace('/(\W{2,})(?=\1+)/', '', $title)

Any idea how to do it in PHP with regex?

Other better solution is also welcomed (I cannot strip all the non-alphanum characters as they can make sense).

Edit: the objective is only to avoid most common issues. The other creative cases will be sanitized manually or sanitized with an other regex.

© Stack Overflow or respective owner

Related posts about regex

Related posts about php