Python: Convert format string to regular expression

Posted by miracle2k on Stack Overflow See other posts from Stack Overflow or by miracle2k
Published on 2010-04-16T17:07:14Z Indexed on 2010/04/16 17:53 UTC
Read the original article Hit count: 243

The users of my app can configure the layout of certain files via a format string.

For example, the config value the user specifies might be:

layout = '%(group)s/foo-%(locale)s/file.txt'

I now need to find all such files that already exist. This seems easy enough using the glob module:

glob_pattern = layout % {'group': '*', 'locale': '*'}
glob.glob(glob_pattern)

However, now comes the hard part: Given the list of glob results, I need to get all those filename-parts that matched a given placeholder, for example all the different "locale" values.

I thought I would generate a regular expression for the format string that I could then match against the list of glob results (or then possibly skipping glob and doing all the matching myself).

But I can't find a nice way to create the regex with both the proper group captures, and escaping the rest of the input.

For example, this might give me a regex that matches the locales:

regex = layout % {'group': '.*', 'locale': (.*)}

But to be sure the regex is valid, I need to pass it through re.escape(), which then also escapes the regex syntax I have just inserted. Calling re.escape() first ruins the format string.

I know there's fnmatch.translate(), which would even give me a regex - but not one that returns the proper groups.

Is there a good way to do this, without a hack like replacing the placeholders with a regex-safe unique value etc.?

Is there possibly some way (a third party library perhaps?) that allows dissecting a format string in a more flexible way, for example splitting the string at the placeholder locations?

© Stack Overflow or respective owner

Related posts about python

Related posts about regular-expression