0170 Sampling, Analyze, Prepare in MonetDB

Let's create test table:

1) CREATE TABLE samplingTable ( Letter CHAR, Number INTEGER );

2) INSERT INTO samplingTable ( Letter, Number )
  VALUES ( 'A', 1 ), ( 'B', 2 ), ( 'C', 3 ), ( 'D', 4 ), ( 'E', 5 )

, ( 'F', 6 ), ( 'G', 7 ), ( 'H', 8 ), ( 'I', 9 ), ( 'J', 10 );

Two Ways of Sampling

Sampling of records is done with SAMPLE clause. This clause accepts continuous range [0,1) as an argument. Selected number is percent of records that will become part of the sample. It is also possible to select any integer bigger than zero. That number will become number of units in the sample.

SELECT * FROM samplingTable SAMPLE 0.2;  
 –20%       
SELECT * FROM samplingTable SAMPLE 3;  
–3 units
SELECT * FROM samplingTable SAMPLE 1;       
–1 unit
It is possible to sample from a subquery.
SELECT * FROM
( SELECT * FROM samplingTable SAMPLE 0.1 ) subqueryAlias;

It is possible to use sample in an INSERT statement.
INSERT INTO samplingTable
SELECT * FROM samplingTable SAMPLE 2;

Samples have uniform distribution.

Table Statistics

Statistics System Table

Statistics about tables is placed in the table sys.statistics. In order to read from this system table, we have to log in as a member of monetdb group.
First, we quit current session (1). Then we have to log in as a monetdb group member (2). Password is "monetdb" (3). Then, we are allowed to read from statistics table (4).

These are the columns in that table:

column_idschematablecolumntypewidthcountuniquenilsminvalmaxvalsortedrevsorted
INTEGERVARCHARVARCHARVARCHARVARCHARINTEGERBIGINTBOOLEANBOOLEANCLOBCLOBBOOLEANBOOLEAN
Internal
ID of column.
Schema name.Table name.Column name.Data
type of column.
Number of bytes.Number
or rows, approximate.
Are values unique in column.True, if there are nulls in column.Minimum in the column or null.Maximum in the column or null.Column sorted ascending.Column sorted descending.

All the strings from this table are small case. When we refer to them, we have to write them exactly like in the table.

SELECT * FROM sys.statistics WHERE COLUMN = 'Letter';This will fail because we have to use lower letters for 'letter'.

Table Producing Functions

Table sys.statistics will give us statistics about all of the user tables in our schema. If we want to read statistics of system tables too, then we should use this table producing function.

SELECT * FROM
sys.statistics();

For reading statistics for one schema, table, or column, we can filter sys.statistics or sys.statistics() tables. Instead of filtering, we should use these table producing functions with arguments which define what data we want to read. This table producing functions are fastest way to get specific statistics.

SELECT * FROM sys.statistics('voc');
Statistics for a schema.
SELECT * FROM
sys.statistics( 'voc','samplingtable');
Arguments are case sensitive. Statistics for a table.
SELECT FROM
sys.statistics('voc','samplingtable','letter');
Statistics for a column.

Refreshing Statistics

We can refresh statistics for the whole schema, for the one table, or for the one column. The refresh will take a long time if we have a lot of data

ANALYZE voc;  –whole schemaANALYZE voc.samplingtable;     –one tableANALYZE voc.samplingtable ( letter );    –one column

Statistics is used by MonetDB, to optimize queries. If we made a lot of deletes, inserts or updates, statistics will become obsolete so we should refresh it. This will make our queries performant again.

Prepare Statement

If we want to use some statement many times, we can prepare it as well. That means that executing plan for that statement will be cached during the session. Each sequential call of that statement will use cached version, but with different parameters.

As a monetdb user, our default schema is sys. Before going through example, we will first change our current schema to voc.

Now, instead of voc.samplingTable, we can just type samplingTable.

For example, let's say that we want to insert three rows in our table samplingTable. For that we can use three separate statements.

INSERT INTO samplingTable ( Letter, Number ) VALUES ( 'O', 100 );
INSERT INTO samplingTable ( Letter, Number ) VALUES ( 'P', 101 );
INSERT INTO samplingTable ( Letter, Number ) VALUES ( 'Q', 102 );

For each of these statements, MonetDB will create separate execution plan. To avoid doing the same work three times, we will prepare this statement.

Instead of parameters, we will use question marks. They are placeholders for real values.
PREPARE INSERT INTO samplingTable ( Letter, Number ) VALUES ( ?, ? );

Zero, from the image on the right, is an ID of our prepared statement. We can use that ID to call our prepared statement.

We will now execute our three statements from above, by using this prepared statement. Zero is ID of our prepared statement.

EXECUTE 0( '0', 100);
EXECUTE 0( 'P', 101);
EXECUTE 0( 'Q', 102);
This will insert three new rows in samplingTable.
Data about our prepared statements is in the system table sys.prepared_statements.

We can release this prepared statement from our session cache.

DEALLOCATE PREPARE 0 --This will remove just prepared statement with ID 0.
DEALLOCATE PREPARE ALL --This will remove all prepared statements.

Prepared statements will be deleted automatically, if any error happens during the user session.

0160 Subqueries in MonetDB

First, we will create sample tables.

1) CREATE TABLE outerQuery ( Letter CHAR, Number TINYINT );
2) INSERT INTO outerQuery ( Letter, Number )
  VALUES ( 'A', 1 ), ( 'B', 2 );
1) CREATE TABLE innerQuery ( Letter VARCHAR(1), Number SMALLINT );
2) INSERT INTO innerQuery ( Letter, Number )
  VALUES ( 'B', 2 ), ( 'C', 3 );

The Definition of Subquery

Subquery is SELECT statement nested within some other SQL statement. Subquery can be nested within an UPDATE, DELETE, or other SELECT statement.

We can use subquery in three ways.
1) As a source table. Here we use a subquery in the FROM clause.
2) As an expression that will become a new column. Here we use a subquery in the SELECT clause.
3) As an operand in Boolean expressions. This is when we use subquery in WHERE or HAVING clauses.

The SELECT Statement

SELECT statement is special. Only in the SELECT statement we can subquery as a source table, or an expression column.

Subquery as a Source Table

We can read our data from subquery. In this case subquery replaces the source table.
 
SELECT * FROM innerQuey;   
SELECT * FROM ( SELECT * FROM innerQuery ) subqueryAlias

 
The distinguishing element of each subquery is parentheses. They are always mandatory.
If we are using subquery as the source table, then we must also use the alias
SELECT *
FROM outerQuery
INNER JOIN ( SELECT * FROM innerQuery ) subqueryAlias
    ON outerQuery.Letter = SubqueryAlias.Letter;
It is possible to use a
subquery within a
JOIN clause.
SELECT * FROM
( SELECT * FROM outerQuery
UNION
SELECT * FROM innerQuery ) subqueryAlias;
We can also create a union between two
subqueries and then use that union in the
FROM clause. This means that we can
combine several subqueries into one, using set operators.

Subquery as a Column Expression

If a subquery returns a scalar, we can use that value directly as a column value.

SELECT 3, ( SELECT 5 + 5 );SELECT *, ( SELECT 5 + 2 ) columnAlias
FROM outerQuery;

Subquery as an Operand of a Boolean Expression

In SQL expressions we can work with scalars and with tables. These are the two data structures that exist in SQL.

SELECT '7';This will return the scalar.SELECT * FROM Table1;This will return the table.

Once we get the result of a SELECT statement, we can use that result as an operand in Boolean expression. We need operators for expressions. Let's see what kind of operators we can use.

Comparison Operators

If the SELECT query returns a scalar, we can use that value with the comparison operators "=, <>, >, >=".

SELECT ( 2 = ( SELECT Number FROM innerQuery WHERE Letter = 'B' ) ) AS Result;
This is the same as the statement:
SELECT ( 2 = 2 ) AS Result;
The inner query will return the number 2, which we will use in our expression.

For all operators we can negate the result with the NOT prefix.

SELECT ( NOT 2 = ( SELECT Number FROM innerQuery WHERE Letter = 'B' ) ) AS Result;  
This prefix will transform a "true" result into a "false" result.

IN Operator

The IN operator will check if a value belongs to a set. We can negate IN with NOT IN.

SELECT 2 IN ( SELECT Number FROM innerQuery ) AS Result;
The statement above is similar to the statements below. They all would get the same result.
SELECT 2 IN ( 2, 2 ) AS Result;
SELECT 2 IN ( VALUES (2),(2) ) AS Result;  
The statement "VALUES (2), (2)" will return a column. The statement "VALUES ( 2, 2 )" will return a row.

When we use the IN operator, we always have to use only one column. In some databases this statement below will work but this will not work in MonetDB. We can only use one column with the IN operator.
SELECT ( 2, 2 ) IN ( VALUES ( 2, 2 ) );

ALL and ANY

Comparison operators can be made more powerful by combining them with the ALL or ANY operators.

The "Number" column from the innerQuery table has the values 2 and 3.
In the example (1), we check whether ( 1 < 2 ) AND ( 1 < 3 ). The result is TRUE.
SELECT 1 < ALL ( SELECT Number FROM innerQuery ) AS Result;

In example (2), we check whether the number 3 is smaller than all the numbers from the innerQuery table, ( 3 < 2 ) AND ( 3 < 3 ). It is not, this expression returns false.
SELECT 3 < ALL ( SELECT Number FROM innerQuery ) AS Result;

ANY is less demanding than ALL. With ANY, we ask "is there some number in the innerQuery table that will make our comparison TRUE?".

SELECT 2 < ANY ( SELECT Number FROM innerQuery ) AS Result;
ANY is using OR logic, ( 2 < 2 ) OR ( 2 < 3 ).

SELECT 2 < ALL ( SELECT Number FROM innerQuery ) AS Result;
ALL is using AND logic ( 2 < 2 ) AND ( 2 < 3 ).

EXISTS, or NOT EXIST

Sometimes we just want to test whether a query is empty or not. We use the EXISTS operator for that.

Our table does not have the letter 'Z', so the subquery will be empty. That's why the EXISTS operator returns FALSE.
SELECT EXISTS ( SELECT * FROM innerQuery WHERE Letter = 'Z' ) AS Result;


Conclusion

1) We can only compare compatible data types. We cannot compare apples and pears (numbers and strings).
2) The result of all expressions involving a subquery is Boolean.
3) When using subquery operators, the subquery can be on both the left and right sides. Some databases do not allow subqueries to be on the left. MonetDB is not like that.


This subquery is on the left side of operator.
It is custom to place subquery on the right side, and I will follow that convention.

Subquery in WHERE or HAVING clauses

Now that we know how to use subquery in a Boolean expression, we can use it in WHERE or HAVING clauses. We can use HAVING in SELECT statement, but WHERE can be used in SELECT, UPDATE or DELETE statements. Let's look at some examples:

SELECT Letter, SUM( Number ) AS Total FROM
( SELECT * FROM outerQuery
UNION ALL
SELECT * FROM innerQuery ) subqueryAlias GROUP BY Letter;

We will create a union of our two tables to group them together.    
This SELECT statement will be used as an operand. It will return a result of 2.5.

SELECT AVG( Number ) FROM innerQuery;
 
 
SELECT Letter, SUM( Number ) AS Total FROM
( SELECT * FROM outerQuery
UNION ALL
SELECT * FROM innerQuery ) subqueryAlias
GROUP BY Letter
HAVING SUM( Number ) > ( SELECT AVG( Number )                          FROM innerQuery );
We will combine the two
statements above into
one. We will filter only
those groups in which the
SUM(number) values are greater than 2.5.

These are two examples with WHERE.

UPDATE outerQuery
SET Number = 5
WHERE Number IN
( SELECT Number
FROM innerQuery
);
The subquery will return the numbers 2 and 3.
Therefore, the number 2 from the outer query
will be updated to the number 5.

We can return that number using the following statement.

UPDATE outerQuery
SET Number = 2
WHERE Number > ALL 
( SELECT Number
FROM innerQuery
);
The number 5 is now greater than all
numbers from innerQuery. So we're
going to move it back to number 2.

Now it's easy to understand why I named my tables outerQuery and innerQuery. We used outerQuery for the main query, and innerQuery table for the subquery.

Correlated Subquery

A correlated subquery is a special type of subquery. It is extremely powerful, but its performance is dismal. In real life, where we have large tables, using correlated subqueries will kill performance. We can still use it successfully on small tables.

SELECT oq.Letter, oq.Number,
( SELECT iq.Number
FROM innerQuery
iq
WHERE
iq.Number = oq.Number ) AS siblingNumber
FROM outerQuery oq 
In a correlated subquery, we execute the subquery once for each
row of the outer query. The subquery uses the value from the outer
query as its parameter. This can be better understood in the table below.

For each row, we execute the inner query once. For the WHERE parameter of the inner query, we take the value from the outer query. This is how a correlated subquery works.

outerQuery.LetterouterQuery.NumbersiblingNumber
A1SELECT Number FROM innerQuery WHERE Number = 1=> Returns null
B2SELECT Number FROM innerQuery WHERE Number = 2=> Returns 2

Correlated subquery can be used in SELECT, HAVING, WHERE clauses. A subquery is a correlated subquery if it uses a value from an outer query as a parameter.

Order By

In MonetDB, subqueries can use the ORDER BY clause.

SELECT * FROM
( SELECT * FROM innerQuery
ORDER BY Number
) Alias;
In some other databases this is only
allowed if the subquery uses LIMIT
or OFFSET specifiers.

0150 Having, Insert, Update, Delete and Built-in Variables

Where VS Having

Where and Having are two clauses that we will best explain using examples. This time we will create only one table.

1) CREATE TABLE WhereOrHaving ( Letter CHAR DEFAULT 'D', Number TINYINT );
2) INSERT INTO WhereOrHaving VALUES ( 'A', 1 ), ( 'A', 2 ),
                                   ( 'B', 3 ), ( 'B', 4 ),
                                   ( 'C', 5 ), ( 'C', 6 );

Where

Where is simple. It is a filter that will exclude all the rows which don't satisfy the condition.

SELECT * FROM WhereOrHaving WHERE Number > 3;

Having

Having is more complex. First, we will group our table by the first column.
 
SELECT Letter, SUM( Number ) As sumOfGroup
FROM WhereOrHaving
GROUP BY Letter
;
Then we will remove all the rows where sumOfGroup is smaller than 3. This time we do not use WHERE clause. We are not filtering detail rows, this time we are filtering groups. For that we use HAVING clause.

SELECT Letter, SUM( Number ) As sumOfGroup
FROM WhereOrHaving
GROUP BY Letter
HAVING SUM( Number ) > 3

Let's combine these two clauses to better understand them.

1) This is our starting table. We will transform it step by step.2) Next, we will filter only those rows where Number is bigger than 3.
3) We will now group the table from the previous step by the "Letter" column.4) And finally, we will filter only groups that are bigger than 6.

So, first we have to use WHERE. WHERE is always used before grouping, because it works on the detail rows. After this comes grouping. When we get our groups then is the time to pick only some of them by using HAVING condition.

INSERT

INSERT statement is used to enter data into table. We will insert some more rows to table WhereOrHaving. Basic way of using INSERT is to make a list of values. These values will be placed in the table as a new row. Our table WhereOrHaving has two columns of CHAR and TINYINT types, so we should be careful to insert data of that data types.

INSERT INTO WhereOrHaving VALUES ( 'E', 36 );
It is also possible to enter several rows at once.
INSERT INTO WhereOrHaving VALUES ( 'X', 36 ), ( 'Y', 46 ), ( 'Z', 56 );

IF we don't provide values for all the columns, insert will fail.
INSERT INTO WhereOrHaving VALUES ( 'M' );
We have to provide values for all the columns, even if some of values are nulls.
INSERT INTO WhereOrHaving VALUES ( 'M', null );

Sometimes we want to be specific into which columns we want to enter values. Now we can provide values in any order we want.

INSERT INTO WhereOrHaving ( Number, Letter ) VALUES ( 13, 'P');
We will first provide value for Number and then for Letter.

If we use syntax where some of the columns are specified, but other are not, then all other columns will be filled with nulls.

INSERT INTO WhereOrHaving ( Letter ) VALUES ( 'G' );

If we enter value for Number, and not for Letter, then the DEFAULT value will be used for the Letter.

INSERT INTO WhereOrHaving ( Number ) VALUES ( 77 );
This image will revise all of our results.

INSERT With SELECT

We will take small sample from the table above.

SELECT * FROM WhereOrHaving WHERE Letter = 'D';

We can now add this sample into any other table. Presumption is that destination table should have the same structure as our sample. For simplicity we will not add this sample to same other table, but we will add it back to our WhereOrHaving table. Then, our table will have two ( 'D', 77 ) rows.

INSERT INTO WhereOrHaving SELECT * FROM WhereOrHaving WHERE Letter = 'D';

UPDATE

We saw that we now have two rows in WhereOrHaving table with nulls. We have "M, null" and "G, null".

UPDATE WhereOrHaving
SET Number = 101
WHERE Letter = 'G';
We have updated "G, null" to "G, 101".

We can update values "M, null" to "Q, null".

UPDATE WhereOrHaving
SET Letter = 'Q'
WHERE Number IS null;
To find rows where column value is null, we use operator IS, and not =.

DEFAULT value for Letter columns is "D". We can set any field in Letter column to this default value.

UPDATE WhereOrHaving
SET Letter = DEFAULT
WHERE Number = 101;
Now we have default letter "D" beside number 101.

DELETE

We use DELETE for deletion of rows. We can return our table, WhereOrHaving, to its original state, by deleting all the added rows.
 
 
DELETE FROM WhereOrHaving
WHERE NOT Letter in ( 'A', 'B', 'C' );

Built-In Variables

MonetDB has a collection of built-in variables. Those variables can provide us with valuable information about our current session and current time.

CURRENT_SCHEMACURRENT_DATECURRENT_TIMESTAMPLOCALTIME
CURRENT_USER       CURRENT_TIMECURRENT_TIMEZONELOCALTIMESTAMP

SELECT CURRENT_SCHEMA AS CurrentSchema, CURRENT_USER AS CurrentUser, CURRENT_DATE AS CurrentDate
     , CURRENT_TIME AS CurrentTime, CURRENT_TIMESTAMP AS CurrentTimestamp, CURRENT_TIMEZONE AS CurrentTimezone
     , LOCALTIME AS LocTime, LOCALTIMESTAMP AS LocTimeStamp;

0140 MonetDB – SET operators

SET OPERATORS

Let's say we have a purple and a green data set. The data set is the result of a query. Using the SET operators, we can treat the rows in those two data sets as mathematical sets.

With joins, we concatenate tables horizontally. If we want to fuse tables vertically then we would use a union. For union to work we need to meet three conditions:
– Two data sets must have the same number of rows.
– Order of columns in one data set must be the same as order of the columns in another data set.
– Columns at the same position must have similar data type.
 
There are two kinds of unions, we have "UNION" and we have "UNION ALL". "UNION ALL" would just glue two data sets together. If the first data set has X rows, and the second one Y rows, final data set would have X + Y rows.
 
"UNION" will return the same thing, but without duplicates.
 
"INTERSECT" means that we are looking for the rows that belong both to the first and the second data set. "INTERSECT" will return only distinct rows.

"EXCEPT" returns only rows from the first dataset, that do not have equivalents in the second data set. Again, only distinct rows will be returned.

Sample tables

We will create two sample tables. Notice that columns in these tables are using compatible data types ( CHAR vs VARCHAR(1), TINYINT vs SMALLINT ).

1) CREATE TABLE aboveTable ( Letter CHAR, Number TINYINT );
2) INSERT INTO aboveTable ( Letter, Number )
  VALUES ( 'A', 1 ), ( 'A', 1 ),
         ( 'B', 2 ), ( 'B', 2 );
1) CREATE TABLE belowTable ( Letter VARCHAR(1), Number SMALLINT );
2) INSERT INTO belowTable ( Letter, Number )
  VALUES ( 'B', 2 ), ( 'B', 2 ),
         ( 'C', 3 ), ( 'C', 3 );

UNION ALL

Our tables are meeting conditions for a UNION ALL.

SELECT * FROM aboveTable UNION ALL SELECT * FROM  belowTable;

UNION

UNION will remove duplicate rows.

SELECT * FROM aboveTable UNION SELECT * FROM belowTable;

INTERSECT

Intersect will give us overlap between two tables.

SELECT * FROM aboveTable INTERSECT SELECT * FROM belowTable;

EXCEPT

Let's see rows that exist in the aboveTable, but not in the belowTable.
 
SELECT * FROM aboveTable EXCEPT SELECT * FROM belowTable;

INTERSECT and EXCEPT with duplicates

If we can use ALL with UNION, is it possible to use it with INTERSECT and EXCEPT? It is possible. This way, all of the duplicates will remain.

SELECT * FROM aboveTable INTERSECT ALL SELECT * FROM  belowTable;SELECT * FROM aboveTable EXCEPT ALL SELECT * FROM  belowTable;

ORDER BY and SET operators

When we use ORDER BY, it must be applied on the last query.
 
SELECT * FROM aboveTable
UNION
SELECT * FROM belowTable ORDER BY Number DESC;

Column names and SET operations

The above SELECT is determining names of columns. Names of columns in above and below SELECT don't have to be the same.

SELECT Letter as L, Number as N
FROM aboveTable
UNION
SELECT * FROM belowTable ORDER BY N DESC;

Connecting More Than 2 Tables with Set Operators

How EXCEPT works

SELECT * FROM aboveTable
UNION ALL
SELECT * FROM belowTable
EXCEPT
SELECT * FROM belowTable;
Chained operations are conducted
in sequence manner. First we make
an UNION, and then we apply EXCEPT.
 
Let's see below steps that are leading to our results.
1) In the first step, UNION ALL will unite our two tables. Next we have to do EXCEPT vs belowTable.2) In the second step, all duplicates from both tables will be removed. Then we would do EXCEPT operation.
 
3) The only row that exist in the first table, and not in the second table is a row "A-1". That is our result.

How EXCEPT ALL works

SELECT * FROM aboveTable
UNION ALL
SELECT * FROM belowTable
EXCEPT ALL
SELECT * FROM belowTable;
When we use EXCEPT ALL, we would
get 4 rows as a result. Let's see below
steps that have leaded to this outcome.
1) In the first step, UNION ALL will unite our two tables. Next, we have to do EXCEPT ALL vs belowTable.2) This time there is no DISTINCT.

Rows from the two tables will destroy each other like matter and antimatter. What remains in the left table will be our final result.

EXCEPT ALL with DISTINCT

Statement below will return table on the right image. Now we have 6 rows in the result data set.

SELECT * FROM aboveTable
UNION ALL
SELECT * FROM belowTable
EXCEPT ALL
SELECT DISTINCT * FROM belowTable;
This time only two rows from the UNION ALL tables will be excluded so we will be left with 6 remaining rows.

UNION AND INTERSECT

Here is one example with INTERSECT. Again, first two data sets will be connected with UNION ALL. Their result will be than intersected with the last data set.

SELECT * FROM aboveTable
UNION ALL
SELECT * FROM belowTable
INTERSECT ALL
SELECT DISTINCT * FROM belowTable;
1) We will start two intermediate data sets. First one is the result of UNION and it will have 8 rows. The other one is created with DISTINCT from the belowTable, and it has 2 rows.
2) Intersection between these two data sets is this:

Two Conclusions

1) SET operators don't have priority among them. They are applied from the top to the bottom. This is different than in some other databases where INTERSECTION has the top priority. This would be the result if we enforce priority of an INTERSECT operator.

2) SET operators without ALL specifiers will first remove duplicates from their operands. SET operators with ALL specifiers will leave their operands untouched. This is true not only for EXCEPT (like in previous examples), but also for UNION and INTERSECT. The message here is that removal of the duplicates is not conducted on the result, but on the operands, and after that SET operators will apply their logic.

Corresponding

THIS IS NOT WORKING IN MONETDB, although documentation claims that it works.

When working with SET operators we need to be careful to order our columns correctly. Their position has to match between tables. This makes things harder. We will face errors many times.
CORRESPONDING clause can fix this problem. CORRESPONDING clause will correctly pair columns so that UNION is successful. Unfortunately, statement on the right side will work in some other databases, but not in the MonetDB, so I will just stop explaining it.

You can read more about this clause on this blog:
https://blog.jooq.org/a-rarely-seen-but-useful-sql-feature-corresponding/
SELECT Number, Letter FROM aboveTable
UNION ALL CORRESPONDING
SELECT Letter, Number FROM belowTable;

013. MonetDB – SELECT from multiple tables

Sample tables

We will create these two tables in the MonetDB database:

We can create these two tables using these statements. The first statement will create the table. The second statement will populate that table with data. These are the same tables we have used in the previous blog post, so maybe you already have them in your database.

1) CREATE TABLE Employees ( EmployeeID INT, Employee VARCHAR(10 ) );
2) INSERT INTO Employees ( EmployeeID, Employee ) VALUES ( 1, 'Hugo' ), ( 2, 'Gabriel' ), ( 3, 'Aurora' ), ( 4, 'Adam' );
1) CREATE TABLE Sales ( SalesID INT, DateCol DATE,  EmployeeID INT, Product CHAR(1), Qty INTEGER, Price INTEGER );
2) INSERT INTO Sales ( SalesID, DateCol, EmployeeID, Product, Qty, Price ) VALUES
     ( 1, '2024-03-05', 2, 'X', 10, 4)
   , ( 2, '2024-03-05', 3, 'X', 20, 5)
   , ( 3, '2024-03-05', 2, 'Y', 30, 6)
   , ( 4, '2024-03-06', 3, 'Y', 40, 7)
   , ( 5, '2024-03-06', 2, 'Z', 50, 8)
   , ( 6, '2024-03-06', 3, 'Z', 60, 9)

Crossjoin

A cross join is when we combine every row from one table with every row from another table. If one table has X rows and the other Y rows, then the result table will have X * Y rows.

SELECT * FROM Employees CROSS JOIN Sales;
 
In our example, the Employees table has 4 rows, the Sales table has 6 rows, so the resulting table has 24 rows. In the picture we can see that each row of the Employee table is combined with each row from the Sales table.

"EmployeeID" column will appear twice if use SELECT with a star.
Statement with a specific column "EmployeeID" will become ambiguous.
SELECT EmployeeID FROM Employees CROSS JOIN Sales;

We'll have to write the full column name to avoid that ambiguity.
SELECT Employees.EmployeeID FROM Employees CROSS JOIN Sales;

Diagram

Diagram shows two tables that we want to join. There are three basic ways how we can join tables, INNER join, LEFT join, and FULL OUTER join.

INNER JOIN means only matching rows will remain after join. In LEFT JOIN, we will not exclude any row from the left table and such rows will be matched with null values in the right table. A FULL OUTER JOIN means that no one will be left out. All orphan rows will be padded using nulls.

INNER JOIN

Let's filter the CROSSJOIN table so that only the rows where the Emploiees.EmploieeID column is equal to the Sales.EmploieeID column will remain.

SELECT * FROM Employees CROSS JOIN Sales
WHERE Employees.EmployeeID = Sales.EmployeeID;

 
This is the definition of an INNER JOIN. An INNER JOIN is a CROSS JOIN where only columns that meet certain conditions are saved in the result.
For INNER JOIN we have this specialized syntax. It makes it clearer what we are trying to achieve.
 
SELECT * FROM Employees INNER JOIN Sales
ON Employees.EmployeeID = Sales.EmployeeID;
The point of INNER JOIN is to only look for successful employees. Gabriel and Aurora each made three sales and we want to see those results. Hugo and Adam did not make any sales so they will be missing from the score.
 
We combine each row from the Employees table with only the matching rows from the Sales table.

LEFT JOIN

SELECT * FROM Employees LEFT JOIN Sales
ON Employees.EmployeeID = Sales.EmployeeID;

 
LEFT OUTER JOIN is when we want to expose employees without sales. LEFT OUTER JOIN will show us all rows from the Employees table. If any rows from the Employees table do not have matching rows in the Sales table, then they will be accompanied by null values.
 
Nulls will help us easily see which rows from the Employees table do not have corresponding rows in the Sales table.

FULL OUTER JOIN

I will add another Sale to the Sales table. We'll pretend that the former employee made that sale.
INSERT INTO SALES ( SalesID, DateCol, EmployeeID, Product, Qty, Price ) VALUES ( 0, '2024-03-04', 0, 'Q', 15, 3 );

That will help us explain the FULL OUTER JOIN.
A FULL OUTER JOIN will explain all our sales and show employees without sales. Now we can see the complete picture. We can see the source of our sales and we can see who is not making sales.

SELECT * FROM Employees FULL OUTER JOIN Sales
ON Employees.EmployeeID = Sales.EmployeeID;

 
In a FULL OUTER JOIN, matching rows will be cross-joined, but non-matching rows, from the both tables, will be completed with nulls.

Matching condition and aliases

The matching conditions can be more complex than in our examples. It can consist of several columns.SELECT Tab1
INNER JOIN Tab2
ON Tab1.Col1 = Tab2.Col1
   AND (Tab1.Col2 = 3
       OR Tab1.Col2 > Tab2.Col3)

We can notice that an overly complex SELECT statement can lead to hard-to-read code. This is a case where we can use aliases wisely. In practice, we usually use short aliases, consisting of only one letter, such as "a" and "b".

SELECT Tab1 a
INNER JOIN Tab2 b ON a.Col1 = b.Col1
AND (a.Col2 = 3
     OR a.Col2 > b.Col3)
The left table would be given the alias a and the right table would be given the alias b.
This makes things much more readable.
It is also possible to create aliases for the columns.

SELECT *
FROM Employees a( EmpNumber, Name )
INNER JOIN Sales b
ON a.EmpNumber = b.EmployeeID;

Join with more than two tables

We will add one more table to our database in order to explain JOIN with more than two tables.

1) CREATE TABLE Calendar ( DateCol DATE, MonthCol INTEGER, YearCol INTEGER );
2) INSERT INTO Calendar ( DateCol, MonthCol, YearCol )
   VALUES ( '2024-03-05', 3, 2024 ), ( '2024-03-06' , 3, 2024 );

The tables are connected in a chain manner. We can join as many tables as we want.

SELECT * FROM Employees
FULL OUTER JOIN Sales
   ON Employees.EmployeeID = Sales.EmployeeID

FULL OUTER JOIN Calendar
   ON Sales.DateCol = Calendar.DateCol;

Special syntaxes

These special syntaxes are not often used. These syntaxes are based on the fact that tables are usually joined by columns with the same name and data type.

NATURAL JOIN

A NATURAL JOIN is an INNER JOIN between two tables. The condition is based on equality between columns that have the same name and the same data type. Let's say we have an INNER JOIN between the Emploees and Sales tables.

 
INNER JOIN
 
SELECT * FROM Employees INNER JOIN Sales
ON Employees.EmployeeID = Sales.EmployeeID;
An easier way to achieve this is to use NATURAL JOIN. A NATURAL JOIN will give us the same result based on the fact that the EmployeeID column exists in both tables and is of the same data type. There are no other columns with such quality.
 
SELECT * FROM Employees NATURAL JOIN Sales;

USING Clause

The USING clause also assumes that we have some columns in both tables that have the same name and are of the same type. With the USING clause, we can specify any type of join (INNER, LEFT, LEFT OUTER). Let's make a left join between the tables Employees and Sales.

 

 
LEFT JOIN
 
SELECT * FROM Employees LEFT JOIN Sales
ON Employees.EmployeeID = Sales.EmployeeID;
An easier way to achieve this is to use the USING clause. In the USING clause, we only need to specify the columns that exist in both tables and have the same name and data type. The JOIN condition will be based on the equality of those columns.
 
SELECT * FROM Employees LEFT JOIN Sales
USING ( EmployeeID );