Square Brackets in Python Regular Expressions (re.sub)

Posted by user1479984 on Stack Overflow See other posts from Stack Overflow or by user1479984
Published on 2012-06-25T15:08:05Z Indexed on 2012/06/25 15:15 UTC
Read the original article Hit count: 191

Filed under:
|

I'm migrating wiki pages from the FlexWiki engine to the FOSwiki engine using Python regular expressions to handle the differences between the two engines' markup languages.

The FlexWiki markup and the FOSwiki markup, for reference.

Most of the conversion works very well, except when I try to convert the renamed links. Both wikis support renamed links in their markup.

For example, Flexwiki uses:

"Link To Wikipedia":[http://www.wikipedia.org/]

FOSwiki uses:

[[http://www.wikipedia.org/][Link To Wikipedia]]

both of which produce something that looks like

I'm using the regular expression

renameLink = re.compile ("\"(?P<linkName>[^\"]+)\":\[(?P<linkTarget>[^\[\]]+)\]")

to parse out the link elements from the FlexWiki markup, which after running through something like

"Link Name":[LinkTarget]

is reliably producing groups

<linkName> = Link Name
<linkTarget = LinkTarget

My issue occurs when I try to use re.sub to insert the parsed content into the FOSwiki markup.

My experience with regular expressions isn't anything to write home about, but I'm under the impression that, given the groups

<linkName> = Link Name
<linkTarget = LinkTarget

a line like

line = renameLink.sub ( "[[\g<linkTarget>][\g<linkName>]]" , line )

should produce

[[LinkTarget][Link Name]]

However, in the output to the text files I'm getting

[[LinkTarget [[Link Name]]

which breaks the renamed links.

After a little bit of fiddling I managed a workaround, where

line = renameLink.sub ( "[[\g<linkTarget>][ [\g<linkName>]]" , line )

produces

[[LinkTarget][ [[Link Name]]

which, when displayed in FOSwiki looks like

<[[Link Name> <--- Which WORKS, but isn't very pretty.

I've also tried

line = renameLink.sub ( "[[\g<linkTarget>]" + "[\g<linkName>]]" , line )

which is producing

[[linkTarget [[linkName]]

There are probably thousands of instances of these renamed links in the pages I'm trying to convert, so fixing it by hand isn't any good. For the record I've run the script under Python 2.5.4 and Python 2.7.3, and gotten the same results.

Am I missing something really obvious with the syntax? Or is there an easy workaround?

© Stack Overflow or respective owner

Related posts about python

Related posts about regex