namedtuple - Developer IT

What are "named tuples" in Python?

- by Denilson Sá

Reading the changes in Python 3.1, I found something... unexpected: The sys.version_info tuple is now a named tuple: I never heard about named tuples before, and I thought elements could either be indexed by numbers (like in tuples and lists) or by keys (like in dicts). I never expected they could be indexed both ways. Thus, my questions are: What are named tuples? How to use them? Why/when should I use named tuples instead of normal tuples? Why/when should I use normal tuples instead of named tuples? Is there any kind of "named list" (a mutable version of the named tuple)?

Read the article

Python hashable dicts

- by TokenMacGuy

As an exercise, and mostly for my own amusement, I'm implementing a backtracking packrat parser. The inspiration for this is i'd like to have a better idea about how hygenic macros would work in an algol-like language (as apposed to the syntax free lisp dialects you normally find them in). Because of this, different passes through the input might see different grammars, so cached parse results are invalid, unless I also store the current version of the grammar along with the cached parse results. (EDIT: a consequence of this use of key-value collections is that they should be immutable, but I don't intend to expose the interface to allow them to be changed, so either mutable or immutable collections are fine) The problem is that python dicts cannot appear as keys to other dicts. Even using a tuple (as I'd be doing anyways) doesn't help. >>> cache = {} >>> rule = {"foo":"bar"} >>> cache[(rule, "baz")] = "quux" Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'dict' >>> I guess it has to be tuples all the way down. Now the python standard library provides approximately what i'd need, collections.namedtuple has a very different syntax, but can be used as a key. continuing from above session: >>> from collections import namedtuple >>> Rule = namedtuple("Rule",rule.keys()) >>> cache[(Rule(**rule), "baz")] = "quux" >>> cache {(Rule(foo='bar'), 'baz'): 'quux'} Ok. But I have to make a class for each possible combination of keys in the rule I would want to use, which isn't so bad, because each parse rule knows exactly what parameters it uses, so that class can be defined at the same time as the function that parses the rule. But combining the rules together is much more dynamic. In particular, I'd like a simple way to have rules override other rules, but collections.namedtuple has no analogue to dict.update(). Edit: An additional problem with namedtuples is that they are strictly positional. Two tuples that look like they should be different can in fact be the same: >>> you = namedtuple("foo",["bar","baz"]) >>> me = namedtuple("foo",["bar","quux"]) >>> you(bar=1,baz=2) == me(bar=1,quux=2) True >>> bob = namedtuple("foo",["baz","bar"]) >>> you(bar=1,baz=2) == bob(bar=1,baz=2) False tl'dr: How do I get dicts that can be used as keys to other dicts? Having hacked a bit on the answers, here's the more complete solution I'm using. Note that this does a bit extra work to make the resulting dicts vaguely immutable for practical purposes. Of course it's still quite easy to hack around it by calling dict.__setitem__(instance, key, value) but we're all adults here. class hashdict(dict): """ hashable dict implementation, suitable for use as a key into other dicts. >>> h1 = hashdict({"apples": 1, "bananas":2}) >>> h2 = hashdict({"bananas": 3, "mangoes": 5}) >>> h1+h2 hashdict(apples=1, bananas=3, mangoes=5) >>> d1 = {} >>> d1[h1] = "salad" >>> d1[h1] 'salad' >>> d1[h2] Traceback (most recent call last): ... KeyError: hashdict(bananas=3, mangoes=5) based on answers from http://stackoverflow.com/questions/1151658/python-hashable-dicts """ def __key(self): return tuple(sorted(self.items())) def __repr__(self): return "{0}({1})".format(self.__class__.__name__, ", ".join("{0}={1}".format( str(i[0]),repr(i[1])) for i in self.__key())) def __hash__(self): return hash(self.__key()) def __setitem__(self, key, value): raise TypeError("{0} does not support item assignment" .format(self.__class__.__name__)) def __delitem__(self, key): raise TypeError("{0} does not support item assignment" .format(self.__class__.__name__)) def clear(self): raise TypeError("{0} does not support item assignment" .format(self.__class__.__name__)) def pop(self, *args, **kwargs): raise TypeError("{0} does not support item assignment" .format(self.__class__.__name__)) def popitem(self, *args, **kwargs): raise TypeError("{0} does not support item assignment" .format(self.__class__.__name__)) def setdefault(self, *args, **kwargs): raise TypeError("{0} does not support item assignment" .format(self.__class__.__name__)) def update(self, *args, **kwargs): raise TypeError("{0} does not support item assignment" .format(self.__class__.__name__)) def __add__(self, right): result = hashdict(self) dict.update(result, right) return result if __name__ == "__main__": import doctest doctest.testmod()

Read the article

what would be a frozen dict ?

- by dugres

A frozen set is a frozenset. A frozen list could be a tuple. What would be a frozen dict ? An immutable, hashable dict. I guess it could be something like collections.namedtuple, but namedtuple is more like a frozenkeys dict (an half-frozen dict). No ?

Read the article

Comparing dicts and update a list of result

- by lmnt

Hello, I have a list of dicts and I want to compare each dict in that list with a dict in a resulting list, add it to the result list if it's not there, and if it's there, update a counter associated with that dict. At first I wanted to use the solution described at http://stackoverflow.com/questions/1692388/python-list-of-dict-if-exists-increment-a-dict-value-if-not-append-a-new-dict but I got an error where one dict can not be used as a key to another dict. So the data structure I opted for is a list where each entry is a dict and an int: r = [[{'src': '', 'dst': '', 'cmd': ''}, 0]] The original dataset (that should be compared to the resulting dataset) is a list of dicts: d1 = {'src': '192.168.0.1', 'dst': '192.168.0.2', 'cmd': 'cmd1'} d2 = {'src': '192.168.0.1', 'dst': '192.168.0.2', 'cmd': 'cmd2'} d3 = {'src': '192.168.0.2', 'dst': '192.168.0.1', 'cmd': 'cmd1'} d4 = {'src': '192.168.0.1', 'dst': '192.168.0.2', 'cmd': 'cmd1'} o = [d1, d2, d3, d4] The result should be: r = [[{'src': '192.168.0.1', 'dst': '192.168.0.2', 'cmd': 'cmd1'}, 2], [{'src': '192.168.0.1', 'dst': '192.168.0.2', 'cmd': 'cmd2'}, 1], [{'src': '192.168.0.2', 'dst': '192.168.0.1', 'cmd': 'cmd1'}, 1]] What is the best way to accomplish this? I have a few code examples but none is really good and most is not working correctly. Thanks for any input on this! UPDATE The final code after Tamås comments is: from collections import namedtuple, defaultdict DataClass = namedtuple("DataClass", "src dst cmd") d1 = DataClass(src='192.168.0.1', dst='192.168.0.2', cmd='cmd1') d2 = DataClass(src='192.168.0.1', dst='192.168.0.2', cmd='cmd2') d3 = DataClass(src='192.168.0.2', dst='192.168.0.1', cmd='cmd1') d4 = DataClass(src='192.168.0.1', dst='192.168.0.2', cmd='cmd1') ds = d1, d2, d3, d4 r = defaultdict(int) for d in ds: r[d] += 1 print "list to compare" for d in ds: print d print "result after merge" for k, v in r.iteritems(): print("%s: %s" % (k, v))

Read the article

"import numpy" tries to load my own package

- by Sebastian

I have a python (2.7) project containing my own packages util and operator (and so forth). I read about relative imports, but perhaps I didn't understand. I have the following directory structure: top-dir/ util/__init__.py (empty) util/ua.py util/ub.py operator/__init__.py ... test/test1.py The test1.py file contains #!/usr/bin/env python2 from __future__ import absolute_import # removing this line dosn't change anything. It's default functionality in python2.7 I guess import numpy as np It's fine when I execute test1.py inside the test/ folder. But when I move to the top-dir/ the import numpy wants to include my own util package: Traceback (most recent call last): File "tests/laplace_2d_square.py", line 4, in <module> import numpy as np File "/usr/lib/python2.7/site-packages/numpy/__init__.py", line 137, in <module> import add_newdocs File "/usr/lib/python2.7/site-packages/numpy/add_newdocs.py", line 9, in <module> from numpy.lib import add_newdoc File "/usr/lib/python2.7/site-packages/numpy/lib/__init__.py", line 4, in <module> from type_check import * File "/usr/lib/python2.7/site-packages/numpy/lib/type_check.py", line 8, in <module> import numpy.core.numeric as _nx File "/usr/lib/python2.7/site-packages/numpy/core/__init__.py", line 45, in <module> from numpy.testing import Tester File "/usr/lib/python2.7/site-packages/numpy/testing/__init__.py", line 8, in <module> from unittest import TestCase File "/usr/lib/python2.7/unittest/__init__.py", line 58, in <module> from .result import TestResult File "/usr/lib/python2.7/unittest/result.py", line 9, in <module> from . import util File "/usr/lib/python2.7/unittest/util.py", line 2, in <module> from collections import namedtuple, OrderedDict File "/usr/lib/python2.7/collections.py", line 9, in <module> from operator import itemgetter as _itemgetter, eq as _eq ImportError: cannot import name itemgetter The troublesome line is either from . import util or perhaps from operator import itemgetter as _itemgetter, eq as _eq What can I do?

Read the article

How would I make this faster? Parsing Word/sorting by heading [on hold]

- by Doof12

Currently it takes about 3 minutes to run through a single 53 page word document. Hopefully you all have some advice about speeding up the process. Code: import win32com.client as win32 from glob import glob import io import re from collections import namedtuple from collections import defaultdict import pprint raw_files = glob('*.docx') word = win32.gencache.EnsureDispatch('Word.Application') word.Visible = False oFile = io.open("rawsort.txt", "w+", encoding = "utf-8")#text dump doccat= list() for f in raw_files: word.Documents.Open(f) doc = word.ActiveDocument #whichever document is active at the time doc.ConvertNumbersToText() print doc.Paragraphs.Count for x in xrange(1, doc.Paragraphs.Count+1):#for loop to print through paragraphs oText = doc.Paragraphs(x) if not oText.Range.Tables.Count >0 : results = re.match('(?P<number>(([1-3]*[A-D]*[0-9]*)(.[1-3]*[0-9])+))', oText.Range.Text) stylematch = re.match('Heading \d', oText.Style.NameLocal) if results!= None and oText.Style != None and stylematch != None: doccat.append((oText.Style.NameLocal, oText.Range.Text[:len(results.group('number'))],oText.Range.Text[len(results.group('number')):])) style = oText.Style.NameLocal else: if oText.Range.Font.Bold == True : doccat.append(style, oText) oFile.write(unicode(doccat)) oFile.close() The for Paragraph loop obviously takes the most amount of time. Is there some way of identifying and appending it without going through every Paragraph?

Read the article

What is the fastest (to access) struct-like object in Python?

- by DNS

I'm optimizing some code whose main bottleneck is running through and accessing a very large list of struct-like objects. Currently I'm using namedtuples, for readability. But some quick benchmarking using 'timeit' shows that this is really the wrong way to go where performance is a factor: Named tuple with a, b, c: >>> timeit("z = a.c", "from __main__ import a") 0.38655471766332994 Class using __slots__, with a, b, c: >>> timeit("z = b.c", "from __main__ import b") 0.14527461047146062 Dictionary with keys a, b, c: >>> timeit("z = c['c']", "from __main__ import c") 0.11588272541098377 Tuple with three values, using a constant key: >>> timeit("z = d[2]", "from __main__ import d") 0.11106188992948773 List with three values, using a constant key: >>> timeit("z = e[2]", "from __main__ import e") 0.086038238242508669 Tuple with three values, using a local key: >>> timeit("z = d[key]", "from __main__ import d, key") 0.11187358437882722 List with three values, using a local key: >>> timeit("z = e[key]", "from __main__ import e, key") 0.088604143037173344 First of all, is there anything about these little timeit tests that would render them invalid? I ran each several times, to make sure no random system event had thrown them off, and the results were almost identical. It would appear that dictionaries offer the best balance between performance and readability, with classes coming in second. This is unfortunate, since, for my purposes, I also need the object to be sequence-like; hence my choice of namedtuple. Lists are substantially faster, but constant keys are unmaintainable; I'd have to create a bunch of index-constants, i.e. KEY_1 = 1, KEY_2 = 2, etc. which is also not ideal. Am I stuck with these choices, or is there an alternative that I've missed?

Developer IT