Sample tables
We will create these two tables in the MonetDB database:
We can create these two tables using these statements. The first statement will create the table. The second statement will populate that table with data. These are the same tables we have used in the previous blog post, so maybe you already have them in your database.
1) CREATE TABLE Employees ( EmployeeID INT, Employee VARCHAR(10 ) ); 2) INSERT INTO Employees ( EmployeeID, Employee ) VALUES ( 1, 'Hugo' ), ( 2, 'Gabriel' ), ( 3, 'Aurora' ), ( 4, 'Adam' ); |
1) CREATE TABLE Sales ( SalesID INT, DateCol DATE, EmployeeID INT, Product CHAR(1), Qty INTEGER, Price INTEGER ); 2) INSERT INTO Sales ( SalesID, DateCol, EmployeeID, Product, Qty, Price ) VALUES ( 1, '2024-03-05', 2, 'X', 10, 4) , ( 2, '2024-03-05', 3, 'X', 20, 5) , ( 3, '2024-03-05', 2, 'Y', 30, 6) , ( 4, '2024-03-06', 3, 'Y', 40, 7) , ( 5, '2024-03-06', 2, 'Z', 50, 8) , ( 6, '2024-03-06', 3, 'Z', 60, 9) |
Crossjoin
A cross join is when we combine every row from one table with every row from another table. If one table has X rows and the other Y rows, then the result table will have X * Y rows.SELECT * FROM Employees CROSS JOIN Sales; In our example, the Employees table has 4 rows, the Sales table has 6 rows, so the resulting table has 24 rows. In the picture we can see that each row of the Employee table is combined with each row from the Sales table. "EmployeeID" column will appear twice if use SELECT with a star. |
Statement with a specific column "EmployeeID" will become ambiguous.SELECT EmployeeID FROM Employees CROSS JOIN Sales; We'll have to write the full column name to avoid that ambiguity. SELECT Employees.EmployeeID FROM Employees CROSS JOIN Sales; |
Diagram
Diagram shows two tables that we want to join. There are three basic ways how we can join tables, INNER join, LEFT join, and FULL OUTER join.
INNER JOIN means only matching rows will remain after join. In LEFT JOIN, we will not exclude any row from the left table and such rows will be matched with null values in the right table. A FULL OUTER JOIN means that no one will be left out. All orphan rows will be padded using nulls.
INNER JOIN
Let's filter the CROSSJOIN table so that only the rows where the Emploiees.EmploieeID column is equal to the Sales.EmploieeID column will remain.
SELECT * FROM Employees CROSS JOIN Sales This is the definition of an INNER JOIN. An INNER JOIN is a CROSS JOIN where only columns that meet certain conditions are saved in the result. |
For INNER JOIN we have this specialized syntax. It makes it clearer what we are trying to achieve.SELECT * FROM Employees INNER JOIN Sales |
The point of INNER JOIN is to only look for successful employees. Gabriel and Aurora each made three sales and we want to see those results. Hugo and Adam did not make any sales so they will be missing from the score. We combine each row from the Employees table with only the matching rows from the Sales table. |
LEFT JOIN
SELECT * FROM Employees LEFT JOIN Sales LEFT OUTER JOIN is when we want to expose employees without sales. LEFT OUTER JOIN will show us all rows from the Employees table. If any rows from the Employees table do not have matching rows in the Sales table, then they will be accompanied by null values. Nulls will help us easily see which rows from the Employees table do not have corresponding rows in the Sales table. |
FULL OUTER JOIN
I will add another Sale to the Sales table. We'll pretend that the former employee made that sale.INSERT INTO SALES ( SalesID, DateCol, EmployeeID, Product, Qty, Price ) VALUES ( 0, '2024-03-04', 0, 'Q', 15, 3 );
That will help us explain the FULL OUTER JOIN. |
A FULL OUTER JOIN will explain all our sales and show employees without sales. Now we can see the complete picture. We can see the source of our sales and we can see who is not making sales.SELECT * FROM Employees FULL OUTER JOIN Sales In a FULL OUTER JOIN, matching rows will be cross-joined, but non-matching rows, from the both tables, will be completed with nulls. |
Matching condition and aliases
The matching conditions can be more complex than in our examples. It can consist of several columns. | SELECT Tab1 INNER JOIN Tab2 ON Tab1.Col1 = Tab2.Col1 AND (Tab1.Col2 = 3 OR Tab1.Col2 > Tab2.Col3) |
We can notice that an overly complex SELECT statement can lead to hard-to-read code. This is a case where we can use aliases wisely. In practice, we usually use short aliases, consisting of only one letter, such as "a" and "b".
SELECT Tab1 a | The left table would be given the alias a and the right table would be given the alias b. This makes things much more readable. |
It is also possible to create aliases for the columns.SELECT * FROM Employees a( EmpNumber, Name ) INNER JOIN Sales b ON a.EmpNumber = b.EmployeeID; |
Join with more than two tables
We will add one more table to our database in order to explain JOIN with more than two tables.
1) CREATE TABLE Calendar ( DateCol DATE, MonthCol INTEGER, YearCol INTEGER ); 2) INSERT INTO Calendar ( DateCol, MonthCol, YearCol ) |
The tables are connected in a chain manner. We can join as many tables as we want.
SELECT * FROM Employees FULL OUTER JOIN Sales ON Employees.EmployeeID = Sales.EmployeeID FULL OUTER JOIN Calendar ON Sales.DateCol = Calendar.DateCol; |
Special syntaxes
These special syntaxes are not often used. These syntaxes are based on the fact that tables are usually joined by columns with the same name and data type.
NATURAL JOIN
A NATURAL JOIN is an INNER JOIN between two tables. The condition is based on equality between columns that have the same name and the same data type. Let's say we have an INNER JOIN between the Emploees and Sales tables.
INNER JOIN SELECT * FROM Employees INNER JOIN Sales | An easier way to achieve this is to use NATURAL JOIN. A NATURAL JOIN will give us the same result based on the fact that the EmployeeID column exists in both tables and is of the same data type. There are no other columns with such quality.SELECT * FROM Employees NATURAL JOIN Sales; |
USING Clause
The USING clause also assumes that we have some columns in both tables that have the same name and are of the same type. With the USING clause, we can specify any type of join (INNER, LEFT, LEFT OUTER). Let's make a left join between the tables Employees and Sales.
LEFT JOIN SELECT * FROM Employees LEFT JOIN Sales | An easier way to achieve this is to use the USING clause. In the USING clause, we only need to specify the columns that exist in both tables and have the same name and data type. The JOIN condition will be based on the equality of those columns.SELECT * FROM Employees LEFT JOIN Sales |