Google App Engine - SiteMap Creation for a social network

Posted by spidee on Stack Overflow See other posts from Stack Overflow or by spidee
Published on 2010-05-12T13:28:56Z Indexed on 2010/05/13 12:24 UTC
Read the original article Hit count: 203

Filed under:
|
|

Hi all.

I am creating a social tool - I want to allow search engines to pick up "public" user profiles - like twitter and face-book.

I have seen all the protocol info at http://www.sitemaps.org and i understand this and how to build such a file - along with an index if i exceed the 50K limit.

Where i am struggling is the concept of how i make this run.

The site map for my general site pages is simple i can use a tool to create the file - or a script - host the file - submit the file and done.

What i then need is a script that will create the site-maps of user profiles. I assume this would be something like:

    <?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.socialsite.com/profile/spidee</loc>
      <lastmod>2010-5-12</lastmod>
      <changefreq>???</changefreq>
      <priority>???</priority>
   </url>
   <url>
      <loc>http://www.socialsite.com/profile/webbsterisback</loc>
      <lastmod>2010-5-12</lastmod>
      <changefreq>???</changefreq>
      <priority>???</priority>
   </url>
</urlset>

Ive added some ??? as i don't know how i should set these settings for my profiles based on the following:-

When a new profile is created it must be added to a site-map. If the profile is changed or if "certain" properties are changed - then i don't know if i update the entry in the map - or do something else? (updating would be a nightmare!)

Some users may change their profile. In terms of relevance to the search engine the only way a google or yahoo search will find the users (for my requirement) profile would be for example by means of [user name] and [location] so once the entry for the profile has been added to the map file the only reason to have the search-bot re-index the profile would be if the user changed their user-name - which they cant. or their location - and or set their settings so that their profile would be "hidden" from search engines.

I assume my map creation will need to be dynamic. From what i have said above i would imagine that creating a new profile and possible editing certain properties could mark it as needing adding/updating in the sitemap.

Assuming i will have millions of profiles added/being edited how can i manage this in a sensible manner.

i know i need a script that can append urls as each profile is created i know the script will prob be a TASK - running at a set freq - perhaps the profiles have a property like "indexed" and the TASK sets them to "true" when the profiles are added to the map. I dont see the best way to store the map - do i store it in the datastore i.e;

model=sitemaps

properties

key_name=sitemap_xml_1 (and for my map sitemap_index_xml)

mapxml=blobstore (the raw xml map or ror map)

full=boolean (set true when url count is 50) # might need this as a shard will tell us

To make this work my thoughts are

m cache the current site map structure as "sitemap_xml" keep a shard of url count when my task executes 1. build the xml structure for say the first 100 urls marked "index==false" (how many could u run at a time?) 2. test if the current mcache sitemap is full (shardcounter+100>50K) 3.a if the map is near full create a new map entry in models "sitemap_xml_2" - update the map_index file (also stored in my model as "sitemap_index" start a new shard - or reset.2 3.b if the map is not full grab it from mcache 4.append the 100 url xml structure 5.save / m cache the map

I can now add a handler using a url map/route like /sitemaps/*

Get my * as map name and serve the maps from the blobstore/mache on the fly.

Now my question is does this work - is this the right way or a good way to start? Will this handle the situation of making sure the search bots update when a user changes their profile - possibly by setting the change freq correctly? - Do i need a more advance system :( ? or have i re-invented the wheel!

I hope this is all clear and make some form of sense :-)

© Stack Overflow or respective owner

Related posts about google

Related posts about google-app-engine