Why is only the suffix of work_index hashed?

Posted by Jaroslav Záruba on Stack Overflow See other posts from Stack Overflow or by Jaroslav Záruba
Published on 2010-06-07T19:42:23Z Indexed on 2010/06/07 22:42 UTC
Read the original article Hit count: 335

Filed under:

google-app-engine

|

google-datastore

I'm reading through the PDF that Brett Slatkin has published for Google I/O 2010:
"Data pipelines with Google App Engine": http://tinyurl.com/3523mej

In the video (the Fan-in part) Brett says that the work_index has to be a hash, so that 'you distribute the load across the BigTable': http://www.youtube.com/watch?v=zSDC_TU7rtc#t=48m44
...and this is how work_index is created:

work_index = '%s-%d' % (sum_name, knuth_hash(index))

...which I guess creates something like 'mySum-54657651321987'

I do understand the basic idea, but is why only one half of work_index is hashed? Is it important to hash only part of it leaving the suffix out? Would it be wrong to do

md5('%s-%d' % (sum_name, index)) so that the hash would be like '6gw8....hq6'

?

I'm Java guy so I would use md5 to hash, which means I get id like 'mySum' + 32 characters. (Obviously I want my ids/keys to be as short as possible here.) If I could hash the whole string my id would be just 32 chars.

Or would you suggest to use something else to do the hashing with?

© Stack Overflow or respective owner

Related posts about google-app-engine

unittest import error with virtualenv + google-app-engine-django

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm working with google-app-engine-django + zipped django. Just running "python manage.py test" succeeded without error. But with virtualenv, test was failed with "import unittest error". same error with Django 1.1. - OSX 10.5.6 - google-app-engine-django (r101 via svn) : r100 was failed with launcher… >>> More
Could not import Django settings into Google App Engine

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello all you Google App Engine experts, I have used Django a little before but am new to Google App Engine and am trying to use it's development web server with Django for the first time. I don't know if this is relevent but I previously had Django 1.1 and Python 2.6 on my Windows XP and even… >>> More
Creating a session scoped bean in Google App Engine using Spring 2.5

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am trying to create a session bean in spring mvc. I am having the following error message when I run my google app engine server in my local box: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'siteController' defined in ServletContext resource [/WEB-INF/springapp-servlet… >>> More
How does Google App Engine precompile Java ?

as seen on Stack Overflow - Search for 'Stack Overflow'
App Engine uses a "precompilation" process with the Java bytecode of an app to enhance the performance of the app in the Java runtime environment. Precompiled code functions identically to the original bytecode. Is there any detailed information what this does? >>> More
Google App Engine UI Widgets

as seen on Stack Overflow - Search for 'Stack Overflow'
Are there any UI widgets available to the python side of Google App Engine? I'd like something like the collapsed/expanded views of Google Groups threads. Are these type things limited to the GWT side? >>> More

Related posts about google-datastore

Google Datastore w/ JDO: Access Times?

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm hitting what appears (to me) strange behavior when I pull data from the google datastore over JDO. In particular, the query executes quickly (say 100 ms), but finding the size of the resulting List< takes about one second! Indeed, whatever operation I try to perform on the resulting list takes… >>> More
How to upload 6000 record to Google Datastore from csv file

as seen on Stack Overflow - Search for 'Stack Overflow'
http://code.google.com/appengine/docs/python/tools/uploadingdata.html is not clearly understand. Where i should call the bulkloader.py or appcfg.py? Should i import the csv file to local Google App Engine SDK first? How to keep the upload and download data process in existing application for datastore… >>> More
ArrayList throwing exception on retrieval from google datastore (with gwt, java)

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm using Google Web Toolkit with java and google datastore as database. The entity class has arraylist and on trying to retrieve the data from data base I'm getting the exception: Type 'org.datanucleus.sco.backed.ArrayList' was not included in the set of types which can be serialized by this SerializationPolicy… >>> More
google datastore - does it do lazy loading?

as seen on Stack Overflow - Search for 'Stack Overflow'
if I have a Customer object with a list of orders, declared using the db.ReferenceProperty after a while I may have huge amount of orders in there, if I pull the Customer object would I be in danger of pulling the complete set of orders? >>> More
From actionscript to google's datastore through java.

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm working on a flash game written in pure actionscript 3.0 in Flex. I've just finished implementing replays for the game, but want to store the top 10 hiscores' replay data on my google-app-engine'd website. I'm using Java for the app-engine stuff in Eclipse in java but I have no idea how to… >>> More