Paginating requests to an API

Posted by user332912 on Stack Overflow See other posts from Stack Overflow or by user332912
Published on 2010-05-05T00:34:01Z Indexed on 2010/05/05 0:38 UTC
Read the original article Hit count: 199

I'm consuming (via urllib/urllib2) an API that returns XML results. The API always returns the total_hit_count for my query, but only allows me to retrieve results in batches of, say, 100 or 1000. The API stipulates I need to specify a start_pos and end_pos for offsetting this, in order to walk through the results.

Say the urllib request looks like "http://someservice?query='test'&start_pos=X&end_pos=Y".

If I send an initial 'taster' query with lowest data transfer such as http://someservice?query='test'&start_pos=1&end_pos=1 in order to get back a result of, for conjecture, total_hits = 1234, I'd like to work out an approach to most cleanly request those 1234 results in batches of, again say, 100 or 1000 or...

This is what I came up with so far, and it seems to work, but I'd like to know if you would have done things differently or if I could improve upon this:

hits_per_page=1000 # or 1000 or 200 or whatever, adjustable
total_hits = 1234 # retreived with BSoup from 'taster query'
base_url = "http://someservice?query='test'"
startdoc_positions = [n for n in range(1, total_hits, hits_per_page)]
enddoc_positions = [startdoc_position + hits_per_page - 1 for startdoc_position in startdoc_positions]
for start, end in zip(startdoc_positions, enddoc_positions):
if end > total_hits: end = total_hits print "url to request is:\n ", print "%s&start_pos=%s&end_pos=%s" % (base_url, start, end)

p.s. I'm a long time consumer of StackOverflow, especially the Python questions, but this is my first question posted. You guys are just brilliant.

© Stack Overflow or respective owner

Related posts about python

Related posts about api