Basic site analytics doesn't tally with Google data

Posted by Jenkz on Stack Overflow See other posts from Stack Overflow or by Jenkz
Published on 2010-03-23T13:53:37Z Indexed on 2010/03/24 2:53 UTC
Read the original article Hit count: 405

After being stumped by an earlier quesiton: SO google-analytics-domain-data-without-filtering

I've been experimenting with a very basic analytics system of my own.

MySQL table:

hit_id, subsite_id, timestamp, ip, url

The subsite_id let's me drill down to a folder (as explained in the previous question).

I can now get the following metrics:

  • Page Views - Grouped by subsite_id and date
  • Unique Page Views - Grouped by subsite_id, date, url, IP (not nesecarily how Google does it!)
  • The usual "most visited page", "likely time to visit" etc etc.

I've now compared my data to that in Google Analytics and found that Google has lower values each metric. Ie, my own setup is counting more hits than Google.

So I've started discounting IP's from various web crawlers, Google, Yahoo & Dotbot so far.

Short Questions:

  1. Is it worth me collating a list of all major crawlers to discount, is any list likely to change regularly?
  2. Are there any other obvious filters that Google will be applying to GA data?
  3. What other data would you collect that might be of use further down the line?
  4. What variables does Google use to work out entrance search keywords to a site?

The data is only going to used internally for our own "subsite ranking system", but I would like to show my users some basic data (page views, most popular pages etc) for their reference.

© Stack Overflow or respective owner

Related posts about google-analytics

Related posts about keywords