Search Results

Search found 2 results on 1 pages for 'pencilnero'.

Page 1/1 | 1 

  • Starting out NLP - Python + large data set

    - by pencilNero
    Hi, I've been wanting to learn python and do some NLP, so have finally gotten round to starting. Downloaded the english wikipedia mirror for a nice chunky dataset to start on, and have been playing around a bit, at this stage just getting some of it into a sqlite db (havent worked with dbs in the past unfort). But I'm guessing sqlite is not the way to go for a full blown nlp project(/experiment :) - what would be the sort of things I should look at ? HBase (.. and hadoop) seem interesting, i guess i could run then im java, prototype in python and maybe migrate the really slow bits to java... alternatively just run Mysql.. but the dataset is 12gb, i wonder if that will be a problem? Also looked at lucene, but not sure how (other than breaking the wiki articles into chunks) i'd get that to work.. What comes to mind for a really flexible NLP platform (i dont really know at this stage WHAT i want to do.. just want to learn large scale lang analysis tbh) ? Many thanks.

    Read the article

  • sqlite & python - only pulls the first result

    - by pencilNero
    This is pretty strange (admitedly, this is my first attempt with python / sqlite), but I can seem to get all of the rows if I do a fetchAll(), but other than that - no matter what I try, always ends up in the db only returning the first row - the second iteration stops because a null is returned. Wondering if there is something wrong with how I am coding this up in python? The db seems ok.. con = sqlite3.connect('backup.db') con.row_factory = sqlite3.Row cur = con.cursor() cur.execute('select * from tb1;') for row in cur: try: # row = dataCur.fetchone() #if row == None: break print type(row) print ' Starting on: %i' % row[0] cleaner = Cleaner(scripts=True, remove_tags=['img'], embedded=True) try: cleaned = cleaner.clean_html(row[2]) #data stored in second col cur.execute('update tb1 set data = ? where id = ?;', (cleaned, row[0])) except AttributeError: print 'Attribute error' print ' Ended on: %i' % row[0] except IOError: print 'IOexception'

    Read the article

1