Oracle Data Mining a Star Schema: Telco Churn Case Study

Posted by charlie.berger on Oracle Blogs See other posts from Oracle Blogs or by charlie.berger
Published on Thu, 09 Dec 2010 18:53:06 +0000 Indexed on 2010/12/09 22:16 UTC
Read the original article Hit count: 967

Filed under:

churn

|

Data Mining

|

Oracle

|

star schema

|

telco

|

transactional data

There is a complete and detailed Telco Churn case study "How to" Blog Series just posted by Ari Mozes, ODM Dev. Manager. In it, Ari provides detailed guidance in how to leverage various strengths of Oracle Data Mining including the ability to:

mine Star Schemas and join tables and views together to obtain a complete 360 degree view of a customer
combine transactional data e.g. call record detail (CDR) data, etc.
define complex data transformation, model build and model deploy analytical methodologies inside the Database

His blog is posted in a multi-part series. Below are some opening excerpts for the first 3 blog entries. This is an excellent resource for any novice to skilled data miner who wants to gain competitive advantage by mining their data inside the Oracle Database.

Many thanks Ari!

Mining a Star Schema: Telco Churn Case Study (1 of 3)

One of the strengths of Oracle Data Mining is the ability to mine star schemas with minimal effort. Star schemas are commonly used in relational databases, and they often contain rich data with interesting patterns. While dimension tables may contain interesting demographics, fact tables will often contain user behavior, such as phone usage or purchase patterns. Both of these aspects - demographics and usage patterns - can provide insight into behavior.

Churn is a critical problem in the telecommunications industry, and companies go to great lengths to reduce the churn of their customer base. One case study¹ describes a telecommunications scenario involving understanding, and identification of, churn, where the underlying data is present in a star schema. That case study is a good example for demonstrating just how natural it is for Oracle Data Mining to analyze a star schema, so it will be used as the basis for this series of posts.

.....

Mining a Star Schema: Telco Churn Case Study (2 of 3)

This post will follow the transformation steps as described in the case study, but will use Oracle SQL as the means for preparing data. Please see the previous post for background material, including links to the case study and to scripts that can be used to replicate the stages in these posts.

1) Handling missing values for call data records
The CDR_T table records the number of phone minutes used by a customer per month and per call type (tariff). For example, the table may contain one record corresponding to the number of peak (call type) minutes in January for a specific customer, and another record associated with international calls in March for the same customer. This table is likely to be fairly dense (most type-month combinations for a given customer will be present) due to the coarse level of aggregation, but there may be some missing values. Missing entries may occur for a number of reasons: the customer made no calls of a particular type in a particular month, the customer switched providers during the timeframe, or perhaps there is a data entry problem. In the first situation, the correct interpretation of a missing entry would be to assume that the number of minutes for the type-month combination is zero. In the other situations, it is not appropriate to assume zero, but rather derive some representative value to replace the missing entries. The referenced case study takes the latter approach. The data is segmented by customer and call type, and within a given customer-call type combination, an average number of minutes is computed and used as a replacement value.
In SQL, we need to generate additional rows for the missing entries and populate those rows with appropriate values. To generate the missing rows, Oracle's partition outer join feature is a perfect fit.

select cust_id, cdre.tariff, cdre.month, mins
from cdr_t cdr partition by (cust_id) right outer join
(select distinct tariff, month from cdr_t) cdre
on (cdr.month = cdre.month and cdr.tariff = cdre.tariff);

.......

Mining a Star Schema: Telco Churn Case Study (3 of 3)

Now that the "difficult" work is complete - preparing the data - we can move to building a predictive model to help identify and understand churn.

The case study suggests that separate models be built for different customer segments (high, medium, low, and very low value customer groups). To reduce the data to a single segment, a filter can be applied:

create or replace view churn_data_high as
select * from churn_prep where value_band = 'HIGH';

It is simple to take a quick look at the predictive aspects of the data on a univariate basis. While this does not capture the more complex multi-variate effects as would occur with the full-blown data mining algorithms, it can give a quick feel as to the predictive aspects of the data as well as validate the data preparation steps. Oracle Data Mining includes a predictive analytics package which enables quick analysis.

begin
dbms_predictive_analytics.explain(
'churn_data_high','churn_m6','expl_churn_tab');

end;

/
select * from expl_churn_tab where rank <= 5 order by rank;

ATTRIBUTE_NAME       ATTRIBUTE_SUBNAME EXPLANATORY_VALUE RANK
-------------------- ----------------- ----------------- ----------
LOS_BAND                                      .069167052          1
MINS_PER_TARIFF_MON PEAK-5                   .034881648          2
REV_PER_MON        REV-5                    .034527798          3
DROPPED_CALLS                   .028110322          4
MINS_PER_TARIFF_MON PEAK-4           .024698149    5

From the above results, it is clear that some predictors do contain information to help identify churn (explanatory value > 0). The strongest uni-variate predictor of churn appears to be the customer's (binned) length of service. The second strongest churn indicator appears to be the number of peak minutes used in the most recent month. The subname column contains the interior piece of the DM_NESTED_NUMERICALS column described in the previous post. By using the object relational approach, many related predictors are included within a single top-level column.

.....

NOTE: These are just EXCERPTS. Click here to start reading the Oracle Data Mining a Star Schema: Telco Churn Case Study from the beginning.

Developer IT