Database warehouse design: fact tables and dimension tables

Posted by morpheous on Stack Overflow See other posts from Stack Overflow or by morpheous
Published on 2010-05-29T07:24:55Z Indexed on 2010/05/29 13:32 UTC
Read the original article Hit count: 447

I am building a poor man's data warehouse using a RDBMS. I have identified the key 'attributes' to be recorded as:

  • sex (true/false)
  • demographic classification (A, B, C etc)
  • place of birth
  • date of birth
  • weight (recorded daily): The fact that is being recorded

My requirements are to be able to run 'OLAP' queries that allow me to:

  • 'slice and dice'
  • 'drill up/down' the data and
  • generally, be able to view the data from different perspectives

After reading up on this topic area, the general consensus seems to be that this is best implemented using dimension tables rather than normalized tables.

Assuming that this assertion is true (i.e. the solution is best implemented using fact and dimension tables), I would like to seek some help in the design of these tables.

'Natural' (or obvious) dimensions are:

  • Date dimension
  • Geographical location

Which have hierarchical attributes. However, I am struggling with how to model the following fields:

  • sex (true/false)
  • demographic classification (A, B, C etc)

The reason I am struggling with these fields is that:

  1. They have no obvious hierarchical attributes which will aid aggregation (AFAIA) - which suggest they should be in a fact table
  2. They are mostly static or very rarely change - which suggests they should be in a dimension table.

Maybe the heuristic I am using above is too crude?

I will give some examples on the type of analysis I would like to carryout on the data warehouse - hopefully that will clarify things further.

I would like to aggregate and analyze the data by sex and demographic classification - e.g. answer questions like:

  • How does male and female weights compare across different demographic classifications?
  • Which demographic classification (male AND female), show the most increase in weight this quarter.

etc.

Can anyone clarify whether sex and demographic classification are part of the fact table, or whether they are (as I suspect) dimension tables.?

Also assuming they are dimension tables, could someone elaborate on the table structures (i.e. the fields)?

The 'obvious' schema:

CREATE TABLE sex_type (is_male int);
CREATE TABLE demographic_category (id int, name varchar(4));

may not be the correct one.

© Stack Overflow or respective owner

Related posts about sql

Related posts about database-design