Replicating SQL's 'Join' in Python
Posted
by Daniel Mathews
on Stack Overflow
See other posts from Stack Overflow
or by Daniel Mathews
Published on 2010-06-06T05:58:24Z
Indexed on
2010/06/06
6:02 UTC
Read the original article
Hit count: 383
I'm in the process of trying to switch from R to Python (mainly issues around general flexibility). With Numpy, matplotlib and ipython, I've am able to cover all my use cases save for merging 'datasets'. I would like to simulate SQL's join by clause (inner, outer, full) purely in python. R handles this with the 'merge' function.
I've tried the numpy.lib.recfunctions join_by, but it critical issues with duplicates along the 'key':
join_by(key, r1, r2, jointype='inner', r1postfix='1', r2postfix='2', defaults=None, usemask=True, asrecarray=False)
Join arrays r1
and r2
on key key
.
The key should be either a string or a sequence of string corresponding
to the fields used to join the array.
An exception is raised if the key
field cannot be found in the two input
arrays.
Neither r1
nor r2
should have any duplicates along key
: the presence
of duplicates will make the output quite unreliable. Note that duplicates
are not looked for by the algorithm.
source: http://presbrey.mit.edu:1234/numpy.lib.recfunctions.html
Any pointers or help will be most appreciated!
© Stack Overflow or respective owner