victorhooi - Developer IT

Log rotation daemons (e.g. logadm) versus Custom bash scripts?

- by victorhooi

Hi, We have a number of applications that generate fairly large (500Mb a day) logfiles that we need to archive/compress on a daily basis. Currently, the log rotation/moving/compressions is done either via custom bash scripts and scheduled via Cron, or in the application's code itself. What (if any) are the advantages of using a system daemon like logadm? (These are Solaris boxes). Cheers, Victor

Read the article

Excel Regex, or export to Python? ; "Vlookup" in Python?

- by victorhooi

heya, We have an Excel file with a worksheet containing people records. 1. Phone Number Sanitation One of the fields is a phone number field, which contains phone numbers in the format e.g.: +XX(Y)ZZZZ-ZZZZ (where X, Y and Z are integers). There are also some records which have less digits, e.g.: +XX(Y)ZZZ-ZZZZ And others with really screwed up formats: +XX(Y)ZZZZ-ZZZZ / ZZZZ or: ZZZZZZZZ We need to sanitise these all into the format: 0YZZZZZZZZ (or OYZZZZZZ with those with less digits). 2. Fill in Supervisor Details Each person also has a supervisor, given as an numeric ID. We need to do a lookup to get the name and email address of that supervisor, and add it to the line. This lookup will be firstly on the same worksheet (i.e. searching itself), and it can then fallback to another workbook with more people. 3. Approach? For the first issue, I was thinking of using regex in Excel/VBA somehow, to do the parsing. My Excel-fu isn't the best, but I suppose I can learn...lol. Any particular points on this one? However, would I be better off exporting the XLS to a CSV (e.g. using xlrd), then using Python to fix up the phone numbers? For the second approach, I was thinking of just using vlookups in Excel, to pull in the data, and somehow, having it fall through, first on searching itself, then on the external workbook, then just putting in error text. Not sure how to do that last part. However, if I do happen to choose to export to CSV and do it in Python, what's an efficient way of doing the vlookup? (Should I convert to a dict, or just iterate? Or is there a better, or more idiomatic way?) Cheers, Victor

Read the article

Extending Django-tagging, adding extra field to each tag?

- by victorhooi

heya, We're coding together a Django app to handle reviews of newspaper articles. Each newspaper article model will an arbitrary number of tags associated with it. Also, each tag will have an optional ranking (0 to 10). I was thinking of using django-tagging to do this (http://code.google.com/p/django-tagging/), but I'm not sure of the best way to add the ranking to the system. Should I extend django-tagging somehow? (Not sure if this is possible, without changing django-tagging's actual code?). Or is there a better way of achieving this? Cheers, Victor

Read the article

Excel CSV into Nested Dictionary; List Comprehensions

- by victorhooi

heya, I have a Excel CSV files with employee records in them. Something like this: mail,first_name,surname,employee_id,manager_id,telephone_number [email protected],john,smith,503422,503423,+65(2)3423-2433 [email protected],george,brown,503097,503098,+65(2)3423-9782 .... I'm using DictReader to put this into a nested dictionary: import csv gd_extract = csv.DictReader(open('filename 20100331 original.csv'), dialect='excel') employees = dict([(row['employee_id'], row) for row in gp_extract]) Is the above the proper way to do it - it does work, but is it the Right Way? Something more efficient? Also, the funny thing is, in IDLE, if I try to print out "employees" at the shell, it seems to cause IDLE to crash (there's approximately 1051 rows). 2. Remove employee_id from inner dict The second issue issue, I'm putting it into a dictionary indexed by employee_id, with the value as a nested dictionary of all the values - however, employee_id is also a key:value inside the nested dictionary, which is a bit redundant? Is there any way to exclude it from the inner dictionary? 3. Manipulate data in comprehension Thirdly, we need do some manipulations to the imported data - for example, all the phone numbers are in the wrong format, so we need to do some regex there. Also, we need to convert manager_id to an actual manager's name, and their email address. Most managers are in the same file, while others are in an external_contractors CSV, which is similar but not quite the same format - I can import that to a separate dict though. Are these two items things that can be done within the single list comprehension, or should I use a for loop? Or does multiple comprehensions work? (sample code would be really awesome here). Or is there a smarter way in Python do it? Cheers, Victor

Read the article

Python DictReader - Skipping rows with missing columns?

- by victorhooi

heya, I have a Excel .CSV file I'm attempting to read in with DictReader. All seems to be well, except it seems to omit rows, specifically those with missing columns. Our input looks like: mail,givenName,sn,lorem,ipsum,dolor,telephoneNumber [email protected],ian,bay,3424,8403,2535,+65(2)34523534545 [email protected],mike,gibson,3424,8403,2535,+65(2)34523534545 [email protected],ross,martin,,,,+65(2)34523534545 [email protected],david,connor,,,,+65(2)34523534545 [email protected],chris,call,3424,8403,2535,+65(2)34523534545 So some of the rows have missing lorem/ipsum/dolor columns, and it's just a string of commas for those. We're reading it in with: def read_gd_dump(input_file="blah 20100423.csv"): gd_extract = csv.DictReader(open('blah 20100423.csv'), restval='missing', dialect='excel') return dict([(row['something'], row) for row in gd_extract]) And I checked that "something" (the key for our dict) isn't one of the missing columns, I had originally suspected it might be that. It's one of the columns after that. However, DictReader seems to completely skip over the rows. I tried setting restval to something, didn't seem to make any difference. I can't seem to find anything in Python's CSV docs (http://docs.python.org/library/csv.html) that would explain this behaviour, but I may have misread something. Any ideas? Thanks, Victor

Read the article

Python - Converting CSV to Objects - Code Design

- by victorhooi

Hi, I have a small script we're using to read in a CSV file containing employees, and perform some basic manipulations on that data. We read in the data (import_gd_dump), and create an Employees object, containing a list of Employee objects (maybe I should think of a better naming convention...lol). We then call clean_all_phone_numbers() on Employees, which calls clean_phone_number() on each Employee, as well as lookup_all_supervisors(), on Employees. import csv import re import sys #class CSVLoader: # """Virtual class to assist with loading in CSV files.""" # def import_gd_dump(self, input_file='Gp Directory 20100331 original.csv'): # gd_extract = csv.DictReader(open(input_file), dialect='excel') # employees = [] # for row in gd_extract: # curr_employee = Employee(row) # employees.append(curr_employee) # return employees # #self.employees = {row['dbdirid']:row for row in gd_extract} # Previously, this was inside a (virtual) class called "CSVLoader". # However, according to here (http://tomayko.com/writings/the-static-method-thing) - the idiomatic way of doing this in Python is not with a class-fucntion but with a module-level function def import_gd_dump(input_file='Gp Directory 20100331 original.csv'): """Return a list ('employee') of dict objects, taken from a Group Directory CSV file.""" gd_extract = csv.DictReader(open(input_file), dialect='excel') employees = [] for row in gd_extract: employees.append(row) return employees def write_gd_formatted(employees_dict, output_file="gd_formatted.csv"): """Read in an Employees() object, and write out each Employee() inside this to a CSV file""" gd_output_fieldnames = ('hrid', 'mail', 'givenName', 'sn', 'dbcostcenter', 'dbdirid', 'hrreportsto', 'PHFull', 'PHFull_message', 'SupervisorEmail', 'SupervisorFirstName', 'SupervisorSurname') try: gd_formatted = csv.DictWriter(open(output_file, 'w', newline=''), fieldnames=gd_output_fieldnames, extrasaction='ignore', dialect='excel') except IOError: print('Unable to open file, IO error (Is it locked?)') sys.exit(1) headers = {n:n for n in gd_output_fieldnames} gd_formatted.writerow(headers) for employee in employees_dict.employee_list: # We're using the employee object's inbuilt __dict__ attribute - hmm, is this good practice? gd_formatted.writerow(employee.__dict__) class Employee: """An Employee in the system, with employee attributes (name, email, cost-centre etc.)""" def __init__(self, employee_attributes): """We use the Employee constructor to convert a dictionary into instance attributes.""" for k, v in employee_attributes.items(): setattr(self, k, v) def clean_phone_number(self): """Perform some rudimentary checks and corrections, to make sure numbers are in the right format. Numbers should be in the form 0XYYYYYYYY, where X is the area code, and Y is the local number.""" if self.telephoneNumber is None or self.telephoneNumber == '': return '', 'Missing phone number.' else: standard_format = re.compile(r'^\+(?P<intl_prefix>\d{2})\((?P<area_code>\d)\)(?P<local_first_half>\d{4})-(?P<local_second_half>\d{4})') extra_zero = re.compile(r'^\+(?P<intl_prefix>\d{2})\(0(?P<area_code>\d)\)(?P<local_first_half>\d{4})-(?P<local_second_half>\d{4})') missing_hyphen = re.compile(r'^\+(?P<intl_prefix>\d{2})\(0(?P<area_code>\d)\)(?P<local_first_half>\d{4})(?P<local_second_half>\d{4})') if standard_format.search(self.telephoneNumber): result = standard_format.search(self.telephoneNumber) return '0' + result.group('area_code') + result.group('local_first_half') + result.group('local_second_half'), '' elif extra_zero.search(self.telephoneNumber): result = extra_zero.search(self.telephoneNumber) return '0' + result.group('area_code') + result.group('local_first_half') + result.group('local_second_half'), 'Extra zero in area code - ask user to remediate. ' elif missing_hyphen.search(self.telephoneNumber): result = missing_hyphen.search(self.telephoneNumber) return '0' + result.group('area_code') + result.group('local_first_half') + result.group('local_second_half'), 'Missing hyphen in local component - ask user to remediate. ' else: return '', "Number didn't match recognised format. Original text is: " + self.telephoneNumber class Employees: def __init__(self, import_list): self.employee_list = [] for employee in import_list: self.employee_list.append(Employee(employee)) def clean_all_phone_numbers(self): for employee in self.employee_list: #Should we just set this directly in Employee.clean_phone_number() instead? employee.PHFull, employee.PHFull_message = employee.clean_phone_number() # Hmm, the search is O(n^2) - there's probably a better way of doing this search? def lookup_all_supervisors(self): for employee in self.employee_list: if employee.hrreportsto is not None and employee.hrreportsto != '': for supervisor in self.employee_list: if supervisor.hrid == employee.hrreportsto: (employee.SupervisorEmail, employee.SupervisorFirstName, employee.SupervisorSurname) = supervisor.mail, supervisor.givenName, supervisor.sn break else: (employee.SupervisorEmail, employee.SupervisorFirstName, employee.SupervisorSurname) = ('Supervisor not found.', 'Supervisor not found.', 'Supervisor not found.') else: (employee.SupervisorEmail, employee.SupervisorFirstName, employee.SupervisorSurname) = ('Supervisor not set.', 'Supervisor not set.', 'Supervisor not set.') #Is thre a more pythonic way of doing this? def print_employees(self): for employee in self.employee_list: print(employee.__dict__) if __name__ == '__main__': db_employees = Employees(import_gd_dump()) db_employees.clean_all_phone_numbers() db_employees.lookup_all_supervisors() #db_employees.print_employees() write_gd_formatted(db_employees) Firstly, my preamble question is, can you see anything inherently wrong with the above, from either a class design or Python point-of-view? Is the logic/design sound? Anyhow, to the specifics: The Employees object has a method, clean_all_phone_numbers(), which calls clean_phone_number() on each Employee object inside it. Is this bad design? If so, why? Also, is the way I'm calling lookup_all_supervisors() bad? Originally, I wrapped the clean_phone_number() and lookup_supervisor() method in a single function, with a single for-loop inside it. clean_phone_number is O(n), I believe, lookup_supervisor is O(n^2) - is it ok splitting it into two loops like this? In clean_all_phone_numbers(), I'm looping on the Employee objects, and settings their values using return/assignment - should I be setting this inside clean_phone_number() itself? There's also a few things that I'm sorted of hacked out, not sure if they're bad practice - e.g. print_employee() and gd_formatted() both use __dict__, and the constructor for Employee uses setattr() to convert a dictionary into instance attributes. I'd value any thoughts at all. If you think the questions are too broad, let me know and I can repost as several split up (I just didn't want to pollute the boards with multiple similar questions, and the three questions are more or less fairly tightly related). Cheers, Victor

Read the article

Python - Open default mail client using mailto, with multiple recipients

- by victorhooi

Hi, I'm attempting to write a Python function to send an email to a list of users, using the default installed mail client. I want to open the email client, and give the user the opportunity to edit the list of users or the email body. I did some searching, and according to here: http://www.sightspecific.com/~mosh/WWW_FAQ/multrec.html It's apparently against the RFC spec to put multiple comma-delimited recipients in a mailto link. However, that's the way everybody else seems to be doing it. What exactly is the modern stance on this? Anyhow, I found the following two sites: http://2ality.blogspot.com/2009/02/generate-emails-with-mailto-urls-and.html http://www.megasolutions.net/python/invoke-users-standard-mail-client-64348.aspx which seem to suggest solutions using urllib.parse (url.parse.quote for me), and webbrowser.open. I tried the sample code from the first link (2ality.blogspot.com), and that worked fine, and opened my default mail client. However, when I try to use the code in my own module, it seems to open up my default browser, for some weird reason. No funny text in the address bar, it just opens up the browser. The email_incorrect_phone_numbers() function is in the Employees class, which contains a dictionary (employee_dict) of Employee objects, which themselves have a number of employee attributes (sn, givenName, mail etc.). Full code is actually here (http://stackoverflow.com/questions/2963975/python-converting-csv-to-objects-code-design) from urllib.parse import quote import webbrowser .... def email_incorrect_phone_numbers(self): email_list = [] for employee in self.employee_dict.values(): if not PhoneNumberFormats.standard_format.search(employee.telephoneNumber): print(employee.telephoneNumber, employee.sn, employee.givenName, employee.mail) email_list.append(employee.mail) recipients = ', '.join(email_list) webbrowser.open("mailto:%s?subject=%s&body=%s" % (recipients, quote("testing"), quote('testing')) ) Any suggestions? Cheers, Victor

Read the article

Python 3.0 IDE - Komodo and Eclipse both flaky?

- by victorhooi

heya, I'm trying to find a decent IDE that supports Python 3.x, and offers code completion/in-built Pydocs viewer, Mercurial integration, and SSH/SFTP support. Anyhow, I'm trying Pydev, and I open up a .py file, it's in the Pydev perspective and the Run As doesn't offer any options. It does when you start a Pydev project, but I don't want to start a project just to edit one single Python script, lol, I want to just open a .py file and have It Just Work... Plan 2, I try Komodo 6 Alpha 2. I actually quite like Komodo, and it's nice and snappy, offers in-built Mercurial support, as well as in-built SSH support (although it lacks SSH HTTP Proxy support, which is slightly annoying). However, for some reason, this refuses to pick up Python 3. In Edit-Preferences-Languages, there's two option, one for Python and Python3, but the Python3 one refuses to work, with either the official Python.org binaries, or ActiveState's own ActivePython 3. Of course, I can set the "Python" interpreter to the 3.1 binary, but that's an ugly hack and breaks Python 2.x support. So, does anybody who uses an IDE for Python have any suggestions on either of these accounts, or can you recommend an alternate IDE for Python 3.0 development? Cheers, Victor

Read the article

Python - Checking for membership inside nested dict

- by victorhooi

heya, This is a followup questions to this one: http://stackoverflow.com/questions/2901422/python-dictreader-skipping-rows-with-missing-columns Turns out I was being silly, and using the wrong ID field. I'm using Python 3.x here. I have a dict of employees, indexed by a string, "directory_id". Each value is a nested dict with employee attributes (phone number, surname etc.). One of these values is a secondary ID, say "internal_id", and another is their manager, call it "manager_internal_id". The "internal_id" field is non-mandatory, and not every employee has one. (I've simplified the fields a little, both to make it easier to read, and also for privacy/compliance reasons). The issue here is that we index (key) each employee by their directory_id, but when we lookup their manager, we need to find managers by their "internal_id". Before, when employee.keys() was a list of internal_ids, I was using a membership check on this. Now, the last part of my if statement won't work, since the internal_ids is part of the dict values, instead of the key itself. def lookup_supervisor(manager_internal_id, employees): if manager_internal_idis not None and manager_internal_id!= "" and manager_internal_id in employees.keys(): return (employees[manager_internal_id]['mail'], employees[manager_internal_id]['givenName'], employees[manager_internal_id]['sn']) else: return ('Supervisor Not Found', 'Supervisor Not Found', 'Supervisor Not Found') So the first question is, how do I check whether the manager_internal_id is present in the dict's values. I've tried substituting employee.keys() with employee.values(), that didn't work. Also, I'm hoping for something a little more efficient, not sure if there's a way to get a subset of the values, specifically, all the entries for employees[directory_id]['internal_id']. Hopefully there's some Pythonic way of doing this, without using a massive heap of nested for/if loops. My second question is, how do I then cleanly return the required employee attributes (mail, givenname, surname etc.). My for loop is iterating over each employee, and calling lookup_supervisor. I'm feeling a bit stupid/stumped here. def tidy_data(employees): for directory_id, data in employees.items(): # We really shouldnt' be passing employees back and forth like this - hmm, classes? data['SupervisorEmail'], data['SupervisorFirstName'], data['SupervisorSurname'] = lookup_supervisor(data['manager_internal_id'], employees) Thanks in advance =), Victor

Developer IT