Excel Regex, or export to Python? ; "Vlookup" in Python?

Posted by victorhooi on Stack Overflow See other posts from Stack Overflow or by victorhooi
Published on 2010-05-05T01:28:05Z Indexed on 2010/05/05 1:38 UTC
Read the original article Hit count: 491

Filed under:
|
|
|

heya,

We have an Excel file with a worksheet containing people records.

1. Phone Number Sanitation

One of the fields is a phone number field, which contains phone numbers in the format e.g.:

+XX(Y)ZZZZ-ZZZZ

(where X, Y and Z are integers).

There are also some records which have less digits, e.g.:

+XX(Y)ZZZ-ZZZZ

And others with really screwed up formats:

+XX(Y)ZZZZ-ZZZZ / ZZZZ

or:

ZZZZZZZZ

We need to sanitise these all into the format:

0YZZZZZZZZ (or OYZZZZZZ with those with less digits).

2. Fill in Supervisor Details

Each person also has a supervisor, given as an numeric ID. We need to do a lookup to get the name and email address of that supervisor, and add it to the line.

This lookup will be firstly on the same worksheet (i.e. searching itself), and it can then fallback to another workbook with more people.

3. Approach?

For the first issue, I was thinking of using regex in Excel/VBA somehow, to do the parsing. My Excel-fu isn't the best, but I suppose I can learn...lol. Any particular points on this one?

However, would I be better off exporting the XLS to a CSV (e.g. using xlrd), then using Python to fix up the phone numbers?

For the second approach, I was thinking of just using vlookups in Excel, to pull in the data, and somehow, having it fall through, first on searching itself, then on the external workbook, then just putting in error text. Not sure how to do that last part.

However, if I do happen to choose to export to CSV and do it in Python, what's an efficient way of doing the vlookup? (Should I convert to a dict, or just iterate? Or is there a better, or more idiomatic way?)

Cheers, Victor

© Stack Overflow or respective owner

Related posts about excel

Related posts about excel-vba