SQL: Speed Improvement - Cluttered union query
- by vol7ron
SELECT * FROM (
    SELECT       a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name
    FROM         current_tbl a
    INNER JOIN   import_tbl  b 
                 ON ( a.user_id = b.user_id )
    UNION
    SELECT       a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name
    FROM         current_tbl a
    INNER JOIN   import_tbl  b 
                 ON (   lower(a.f_name)=lower(b.f_name) 
                    AND lower(a.l_name)=lower(b.l_name) ) 
) foo
--
UNION
--
SELECT a.user_id , a.f_name , a.l_name , '' , '' , '' 
FROM   current_tbl a
WHERE  a.user_id NOT IN (
   select user_id from(
      SELECT       a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name
      FROM         current_tbl a
      INNER JOIN   import_tbl  b 
                   ON ( a.user_id = b.user_id )
      UNION
      SELECT       a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name
      FROM         current_tbl a
      INNER JOIN   import_tbl  b 
                   ON (   lower(a.f_name)=lower(b.f_name) 
                      AND lower(a.l_name)=lower(b.l_name) ) 
   ) bar
)
ORDER BY user_id
Example of table population:
current_tbl:
-------------------------------
user_id  |  f_name  |  l_name
---------+----------+----------
  A1     |  Adam    |  Acorn
  A2     |  Beth    |  Berry
  A3     |  Calv    |  Chard
         |          |
import_tbl:
-------------------------------
user_id  |  f_name  |  l_name
---------+----------+----------
  A1     |  Adam    |  Acorn
  A2     |  Beth    |  Butcher  <- last_name different
         |          |
Expected Output:
-----------------------------------------------------------------------
user_id1  |  f_name1  |  l_name1  |  user_id2  |  f_name2  |  l_name2
----------+-----------+-----------+------------+-----------+-----------
   A1     |  Adam     |  Acorn    |     A1     |  Adam     |  Acorn       
   A2     |  Beth     |  Berry    |     A2     |  Beth     |  Butcher
   A3     |  Calv     |  Chard    |            |           |           
Doing this method gets rid of conditions where the row would be:
   A2     |  Beth     |  Berry    |     A2     |  Beth     |  Butcher
But it keeps the A3 row
I hope this makes sense and I haven't overly simplified it.  This is a continuation question from my other question.  The succession of these improvements has dropped the query down from ~32000ms to where it's at now ~1200ms - quite an improvement.
I supect I can optimize by using UNION ALL in the subquery and of course the usual index optimizations, but I'm looking for the best SQL optimization.  FYI this particular case is for PostgreSQL.