Data Auditor by Example

Posted by Jinjin.Wang on Oracle Blogs See other posts from Oracle Blogs or by Jinjin.Wang
Published on Fri, 19 Mar 2010 00:06:57 -0800 Indexed on 2010/03/19 8:21 UTC
Read the original article Hit count: 314

Filed under:

OWB has a node Data Auditors under Oracle Module in Projects Navigator. What is data auditor and how to use it? I will give an introduction to data auditor and show its usage by examples.

Data auditor is an important tool in ensuring that data quality levels meet business requirements. Data auditor validates data against a set of data rules to determine which records comply and which do not. It gathers statistical metrics on how well the data in a system complies with a rule by auditing and marking how many errors are occurring against the audited table. Data auditors are typically scheduled for regular execution as part of a process flow, to monitor the quality of the data in an operational environment such as a data warehouse or ERP system, either immediately after updates like data loads, or at regular intervals.

How to use data auditor to monitor data quality? Only objects with data rules can be monitored, so the first step is to define data rules according to business requirements and apply them to the objects you want to monitor. The objects can be tables, views, materialized views, and external tables. Secondly create a data auditor containing the objects. You can configure the data auditor and set physical deployment parameters for it as optional, which will be used while running the data auditor. Then deploy and run the data auditor either manually or as part of the process flow. After execution, the data auditor sets several output values, and records that are identified as not complying with the defined data rules contained in the data auditor are written to error tables.

Here is an example.

We have two tables DEPARTMENTS and EMPLOYEES (see pic-1 and pic-2. Click here for DDL and data) imported into OWB. We want to gather statistical metrics on how well data in these two tables satisfies the following requirements:

a. Values of the EMPLOYEES.EMPLOYEE_ID attribute are three-digit numbers.

b. Valid values for EMPLOYEES.JOB_ID are IT_PROG, SA_REP, SH_CLERK, PU_CLERK, and ST_CLERK.

c. EMPLOYEES.EMPLOYEE_ID is related to DEPARTMENTS.MANAGER_ID.

clip_image002

Pic-1 EMPLOYEES

clip_image003

Pic-2 DEPARTMENTS

1. To determine legal data within EMPLOYEES or legal relationships between data in different columns of the two tables, firstly we define data rules based on the three requirements and apply them to tables.

a. The first requirement is about patterns that an attribute is allowed to conform to. We create a Domain Pattern List data rule EMPLOYEE_PATTERN_RULE here. The pattern is defined in the Oracle Database regular expression syntax as ^([0-9]{3})$

clip_image004

Apply data rule EMPLOYEE_PATTERN_RULE to table EMPLOYEES.

clip_image006

© Oracle Blogs or respective owner