How should I deal with user agent parsing in logs?

Posted by Mr. Jefferson on Pro Webmasters See other posts from Pro Webmasters or by Mr. Jefferson
Published on 2012-02-17T19:41:05Z Indexed on 2012/03/23 17:41 UTC
Read the original article Hit count: 296

Filed under:
|

My web app project includes logging functionality so we can see where visitors are coming from (referrer URL), what the popular user agents are, what pages are most popular, etc. The log is stored in SQL Server, and when I query the user agents I use a large (almost 100 lines) and growing CASE statement to separate the user agents using string matching (i.e. if the user agent contains the string "Firefox/9" then it's Firefox 9). Is there a better way to do this so I don't have to continually add to that CASE statement to deal with new browser releases?

Also, how should I deal with less common, weird/unknown user agents? I've seen the following in the logs and been unable to find good information online about what they are:

  • WordPress/3.3.1; http://www.facecolony.org
  • Mozilla/4.0 ( http://www.hairirons.org redips; <a href=http://hairirons.org/>chi hair iron</a>)

I'd guess they're bots/crawlers, but the sites they point to don't appear to reference web crawlers (or even be available sometimes). I've seen other user agents aren't familiar to me, but I know they're bots because they include "bot" or "spider" or something similar in them.

© Pro Webmasters or respective owner

Related posts about logging

Related posts about user-agent