Replicating SQL's 'Join' in Python

Posted by Daniel Mathews on Stack Overflow See other posts from Stack Overflow or by Daniel Mathews
Published on 2010-06-06T05:58:24Z Indexed on 2010/06/06 6:02 UTC
Read the original article Hit count: 383

Filed under:
|

I'm in the process of trying to switch from R to Python (mainly issues around general flexibility). With Numpy, matplotlib and ipython, I've am able to cover all my use cases save for merging 'datasets'. I would like to simulate SQL's join by clause (inner, outer, full) purely in python. R handles this with the 'merge' function.

I've tried the numpy.lib.recfunctions join_by, but it critical issues with duplicates along the 'key':


join_by(key, r1, r2, jointype='inner', r1postfix='1', r2postfix='2', defaults=None, usemask=True, asrecarray=False) Join arrays r1 and r2 on key key.

The key should be either a string or a sequence of string corresponding to the fields used to join the array. An exception is raised if the key field cannot be found in the two input arrays. Neither r1 nor r2 should have any duplicates along key: the presence of duplicates will make the output quite unreliable. Note that duplicates are not looked for by the algorithm.

source: http://presbrey.mit.edu:1234/numpy.lib.recfunctions.html


Any pointers or help will be most appreciated!

© Stack Overflow or respective owner

Related posts about python

Related posts about numpy