(php) regexto remove comments but ignore occurances within strings

Posted by David on Stack Overflow See other posts from Stack Overflow or by David
Published on 2010-03-19T08:29:32Z Indexed on 2010/03/19 8:31 UTC
Read the original article Hit count: 127

Filed under:
|
|
|

Hi there,

I am writing a comment-stripper and trying to accommodate for all needs here. I have the below stack of code which removes pretty much all comments, but it actually goes too far. A lot of time was spent trying and testing and researching the regex patterns to match, but I don't claim that they are the best at each.

My problem is that I also have situation where I have 'PHP comments' (that aren't really comments' in standard code, or even in PHP strings, that I don't actually want to have removed.

Example:

<?php $Var = "Blah blah //this must not comment"; // this must comment. ?>

What ends up happening is that it strips out religiously, which is fine, but it leaves certain problems:

<?php  $Var = "Blah blah  ?>

Also:

will also cause problems, as the comment removes the rest of the line, including the ending ?>

See the problem? So this is what I need...

  • Comment characters within '' or "" need to be ignored
  • PHP Comments on the same line, that use double-slashes, should remove perhaps only the comment itself, or should remove the entire php codeblock.

Here's the patterns I use at the moment, feel free to tell me if there's improvement I can make in my existing patterns? :)

$CompressedData = $OriginalData;
$CompressedData = preg_replace('!/\*.*?\*/!s', '', $CompressedData);  // removes /* comments */
$CompressedData = preg_replace('!//.*?\n!', '', $CompressedData); // removes //comments
$CompressedData = preg_replace('!#.*?\n!', '', $CompressedData); // removes # comments
$CompressedData = preg_replace('/<!--(.*?)-->/', '', $CompressedData); // removes HTML comments

Any help that you can give me would be greatly appreciated! :)

© Stack Overflow or respective owner

Related posts about php

Related posts about regex