Basics of Join Factorization

Posted by Hong Su on Oracle Blogs See other posts from Oracle Blogs or by Hong Su
Published on Fri, 25 Feb 2011 13:49:37 -0800 Indexed on 2011/02/25 23:29 UTC
Read the original article Hit count: 388

We continue our series on optimizer transformations with a post that describes the Join Factorization transformation. The Join Factorization transformation was introduced in Oracle 11g Release 2 and applies to UNION ALL queries. Union all queries are commonly used in database applications, especially in data integration applications. In many scenarios the branches in a UNION All query share a common processing, i.e, refer to the same tables. In the current Oracle execution strategy, each branch of a UNION ALL query is evaluated independently, which leads to repetitive processing, including data access and join. The join factorization transformation offers an opportunity to share the common computations across the UNION ALL branches. Currently, join factorization only factorizes common references to base tables only, i.e, not views.

Consider a simple example of query Q1.

Q1:
    select t1.c1, t2.c2
    from t1, t2, t3
    where
t1.c1 = t2.c1 and t1.c1 > 1 and t2.c2 = 2 and t2.c2 = t3.c2 
  union all
    select t1.c1, t2.c2
    from t1, t2, t4
    where t1.c1 = t2.c1 and t1.c1 > 1 and t2.c3 = t4.c3;

Table t1 appears in both the branches. As does the filter predicates on t1 (t1.c1 > 1) and the join predicates involving t1 (t1.c1 = t2.c1). Nevertheless, without any transformation, the scan (and the filtering) on t1 has to be done twice, once per branch. Such a query may benefit from join factorization which can transform Q1 into Q2 as follows:

Q2:
    select t1.c1, VW_JF_1.item_2
    from t1, (select t2.c1 item_1, t2.c2 item_2
                   from t2, t3
                   where t2.c2 = t3.c2 and t2.c2 = 2                 
                 union all
                   select t2.c1 item_1, t2.c2 item_2
                   from t2, t4 
                   where t2.c3 = t4.c3) VW_JF_1
    where t1.c1 = VW_JF_1.item_1 and t1.c1 > 1;


In Q2, t1 is "factorized" and thus the table scan and the filtering on t1 is done only once (it's shared). If t1 is large, then avoiding one extra scan of t1 can lead to a huge performance improvement.

Another benefit of join factorization is that it can open up more join orders. Let's look at query Q3.

Q3:
    select *
    from t5,
(select t1.c1, t2.c2
                  from t1, t2, t3
                  where t1.c1 = t2.c1 and t1.c1 > 1 and t2.c2 = 2 and t2.c2 = t3.c2 
                union all
                  select t1.c1, t2.c2
                  from t1, t2, t4
                  where t1.c1 = t2.c1 and t1.c1 > 1 and t2.c3 = t4.c3) V;

   where t5.c1 = V.c1

In Q3, view V is same as Q1. Before join factorization, t1, t2 and t3 must be joined first before they can be joined with t5. But if join factorization factorizes t1 from view V, t1 can then be joined with t5. This opens up new join orders. That being said, join factorization imposes certain join orders. For example, in Q2, t2 and t3 appear in the first branch of the UNION ALL query in view VW_JF_1. T2 must be joined with t3 before it can be joined with t1 which is outside of the VW_JF_1 view. The imposed join order may not necessarily be the best join order. For this reason, join factorization is performed under cost-based transformation framework; this means that we cost the plans with and without join factorization and choose the cheapest plan.

Note that if the branches in UNION ALL have DISTINCT clauses, join factorization is not valid. For example, Q4 is NOT semantically equivalent to Q5. 

 Q4:
     select distinct t1.* 
     from t1, t2
     where t1.c1 = t2.c1
  union all
     select distinct t1.*
     from t1, t2
     where t1.c1 = t2.c1

Q5:
    select distinct t1.* 
    from t1, (select t2.c1 item_1 
                  from t2
                union all 
                  select t2.c1 item_1
                  from t2) VW_JF_1 
    where t1.c1 = VW_JF_1.item_1

Q4 might return more rows than Q5. Q5's results are guaranteed to be duplicate free because of the DISTINCT key word at the top level while Q4's results might contain duplicates.  

The examples given so far involve inner joins only. Join factorization is also supported in outer join, anti join and semi join. But only the right tables of outer join, anti join and semi joins can be factorized. It is not semantically correct to factorize the left table of outer join, anti join or semi join. For example, Q6 is NOT semantically equivalent to Q7.

Q6:
    select t1.c1, t2.c2
    from t1, t2
    where t1.c1 = t2.c1(+) and t2.c2 (+) = 2 
 union all
    select t1.c1, t2.c2
    from t1, t2 
    where t1.c1 = t2.c1(+) and t2.c2 (+) = 3

Q7: 
    select t1.c1, VW_JF_1.item_2
    from t1, (select t2.c1 item_1, t2.c2 item_2

                  from t2
                  where t2.c2 = 2
                union all
                  select t2.c1 item_1, t2.c2 item_2
                  from t2                                                                                  

                  where t2.c2 = 3) VW_JF_1     
  where t1.c1 = VW_JF_1.item_1(+)                                                                 

However, the right side of an outer join can be factorized. For example, join factorization can transform Q8 to Q9 by factorizing t2, which is the right table of an outer join.

Q8:
    select t1.c2, t2.c2
    from t1, t2
 
    where t1.c1 = t2.c1 (+) and t1.c1 = 1
 union all
    select t1.c2, t2.c2
    from t1, t2
    where t1.c1 = t2.c1(+) and t1.c1 = 2

Q9:
   select VW_JF_1.item_2, t2.c2
   from t2,
  
           (select t1.c1 item_1, t1.c2 item_2
            from t1
            where t1.c1 = 1
           union all
            select t1.c1 item_1, t1.c2 item_2
            from t1
            where t1.c1 = 2) VW_JF_1
   where VW_JF_1.item_1 = t2.c1(+)

All of the examples in this blog show factorizing a single table from two branches. This is just for ease of illustration. Join factorization can factorize multiple tables and from more than two UNION ALL branches. 

Summary
Join factorization is a cost-based transformation. It can factorize common computations from branches in a UNION ALL query which can lead to huge performance improvement. 

© Oracle Blogs or respective owner

Related posts about 11g

Related posts about database