BEWARE OF THE PREDICATE IN THE OUTER JOIN

0


Today we will have a look what happens when we place a filter condition into an outer join. We will also have a look at how this compares to placing the filter condition into the WHERE clause. Finally we’ll have a look where this could be useful.

Let’s first create some sample data.

1create table customer (
2cust_id number,
3cust_desc varchar2(50)
4);
5 
6ALTER TABLE customer
7add CONSTRAINT pk_customer PRIMARY KEY (cust_id);
8 
9create table customer_loc (
10customer_loc_id NUMBER,
11customer_id NUMBER,
12customer_loc_desc VARCHAR2(50)
13);
14 
15ALTER TABLE customer_loc
16add CONSTRAINT pk_customer_loc PRIMARY KEY (customer_loc_id);
17 
18insert into customer values (1,'Gold');
19insert into customer values (2,'Gold');
20insert into customer values (3,'Silver');
21 
22insert into customer_loc values (1,1,'Dublin');
23insert into customer_loc values (2,2,'Paris');
24insert into customer_loc values (3,4,'Berlin');
25 
26commit;
27 
28analyze table customer compute statistics;
29analyze table customer_loc compute statistics;

Let’s run our first outer join query where we put the filter condition into the WHERE clause. All is as expected. The query just returns customer_id 1.

Now let’s run the same query, but we put the filter condition into the join. This may be different from what you may have expected. What happens is that the query returns all of the rows in the customer table and those rows of table customer_loc where the join condition is met, i.e. those rows where customer_id = 1.

Let’s verify what we have just seen with another query. This time we will put customer_id = 4 into the Join condition. There are no rows in table customer_loc that match this. As expected the query returns all rows for table customer but now rows for table customer_loc

What could this be useful for? One use case would be some limited form of data densification whereby you need to return all of the tables in one table but only a subset of rows in another table. Typically this can only be done using some sort of subselect. An example:

The query below is run in the SH schema and returns all of the customers, but only those sales transactions that are larger than 1000 in sales_amount.

1select a.cust_id,amount_sold
2from customers a
3left outer join sales b on a.cust_id = b.cust_id and amount_sold > 1000;