0200 MonetDB: Window Functions Theory

Imagine people scattered through some area, looking through binoculars. None of them will see the whole scenery, they will only see part of it. It is like each of them is looking through the different window.

Now, imagine that in each row, of a database table, there is a data scientist looking through binoculars. Each data scientist can only see some of the rows from that table. Each data scientist has its own window.

What would scientist do to represent nature of data he is looking at? He would aggregate them. If each of our scientist decide to calculate average of data he is looking at, we would get a table like this one:

In the real database table, with millions of rows, these average values would not be representatives of anything. Our windows are too random. If we can create a rule by which windows are created, then we would have a scientific view of our data. Let's say that each data scientist can only see its own row, and previous two rows. Then we would have a rule. Check out animated image below (left image).

By using this rule for a window creation, we can calculate "moving averages", which are often used in statistics.

We can also define these rules in SQL if we use Window Functions. Window Functions are special, because they can define windows and then apply some aggregation to data in those windows. We can apply aggregations like SUM, AVG, MAX, but we can also use some special aggregation functions.

Window functions are also called Analytic functions, because they give us abilities that are beyond traditional SQL statements. With them we can do things which were previously hard to achieve in SQL, and they are really useful for a deep analysis of our data. Windows are like overlapping samples from our tables. They can reveal us how the nature of our data is changing through time and dimensions.

Terminology

In SQL, window, as explained above, is actually called "frame". The term "window" means something else. We will now discern difference between window, partition and a frame.

Frame is group of records that will be aggregated. Frame is presented with the moving red rectangle in the animation bellow.

Tables in animation below, show how many points each country won on some sport competition.

Sport results presented on the animation above will not be held in a database like three tables, but they will be placed together into one big table.  

That big table is our Window (assuming we are using no filters on that table).
Smaller tables are called Partitions.
Red moving rectangles are Frames.  

Window functions can process partitions separately, the same as they were separate tables.  

How to Define Window?

SELECT employee_id, salary
FROM employees
WHERE department_id = 101  
AND salary > (SELECT AVG(salary)                 FROM employees);
Subquery is not under direct influence of the outer query. In the example, we have a filter
department_id = 101 on the outer query. But subquery is not under the influence of that filter.
Subquery will calculate average salary for all of the employees. Meaning is, that we are looking for employees from department 101, that have bigger salary then the global average.

This is not true for window functions. Window is under influence of the query context. Everything that is used inside FROM, WHERE, GROUP BY and HAVING clauses will define our window. Window functions can only do their magic after the final dataset is defined and unchangeable. That also means that Window functions can only be used in SELECT and ORDER BY clauses.

How to Define Partitions?

Partitions can be defined by the values of one column. All rows that have the same value will be the same partition. On our image, all the rows with letter "A" will create Partition 1. It is also possible to use combination of the values from two or more columns to define partitions. Each unique combination of values will define a partition. On the image below, combination of values A and Q will define Partition 2. We can use expression to calculate values for our column(s). In our example, all the rows, where MOD function returns 1 will belong to Partition 1. Rows that return 0 will belong to Partition 2.

How to Define Frames?

Frames are moving and so, they are always calculated relative to the current row. Two other reference points are the first and the last row in our partition. Position of the frame is always relative to those reference points.

For definition of a frame, we have to define its start row, and its end row. End row has to be after Star row. Below we can see all the ways how to define start and end row.

                    Can only be START ROW
– UNBOUNDED PRECEDING – the first row in the partition                      
                    Can be both START or END ROW
– N PRECEDING – row that is N rows before current row.
– CURRENT ROW – our major reference point.
– M FOLLOWING – row that is M rows after current row.

                     Can only be END ROW
– UNBOUNDED FOLLOWING – the last row in the partition
An example:

Notice that for all of this to make sense, records have to be sorted.

What is a Group?

For window functions, start and end of a frame doesn't have to be a row. Start and end can also be defined with ranges and groups. Ranges and Groups are not individual records, they are sets of records.

Groups are defined similar to Partitions. All rows with the same value will be one group.

On the image to the left, current row is the row 5, but the current group is the Group 3. We are no more looking at 9 records, we are looking at 5 groups.

Our frame will start with one of the groups and will end with one of the groups after.
In this example, our frame will start with the first group in Partition, and will end on the group that is just after the current group.

What is a Range?

Relative positions are important for rows and groups, but not for ranges. With ranges, we are dealing with values in our column.

Let's say that some student took a school test. She scored 85 points on a test and she got a grade "A", because if number of points is between 76-100, then the grade is "A". It is similar with Ranges in window functions. Each frame is defined with a range of values. If a field value belongs to that range, then that record belongs to a frame defined by that range.

So, how we define a range? Really simple. If our current row has a value of X, we will add or subtract some number to that X, and we will get an extreme value of our range.

current value – N         
(syntax: N preceding)
                                          
current value + M        
(syntax: M following)
If, adjacent rows of the current row, have values that are close enough to the current value, then those rows will together make a frame.

Our frame is between [20,30], so all the rows beside the first and the last one, belong to this frame.  

Demonstration

This is a simple example of a window function. This example shows how to calculate cumulative of the qty column. We are not using PARTITION BY clause, so the whole table is one big partition.

0190 Common Table Expressions (CTE) in MonetDB

Table with billion rows and 100 columns, can be represented with one short name. We can use the same logic and give a name to our SELECT statements. If we want to use those SELECT statements in a subquery, join or union, we can refer to them by their names. This is how we can reduce any complex statement to its building blocks. This will break complex logic to simple steps, where each step is easy to understand and manage.

CTEs are a way to name our SELECT statement, and then to use those names in the final SELECT, INSERT, DELETE , UPDATE or MERGE statement.

Syntax of Common Table Expression

Bellow we can see statement with two CTEs. We can have them as many as we want ( Name1, Name2, Name3, Name4 … ). Each CTE will give a custom name to some SELECT statement. Thanks to this, final statement (which can be SELECT, INSERT, DELETE, UPDATE, MERGE), will be short and simple.

WITH
Name1 AS ( SELECT * FROM Table1 ),            --first CTE
Name2 AS ( SELECT * FROM Table2 )             --second CTE  
SELECT *                                      --final statement
FROM
Name1   
 
      JOIN Name2 ON someCondition                  
 
      JOIN Name2 ON someOtherCondition;          

Not only CTE can break our logic into manageable elements, but it can also reduce repetition. We can write "SELECT * FROM Table2" once and then use it twice in the final statement. CTE will only improve readability of our statement and it will make it more concise. It will not improve performance of a statement.

Sample Tables

We will create two sample tables to use them in our CTEs.

CREATE TABLE Cte1 ( Letter CHAR, Number INTEGER );
INSERT INTO Cte1 ( Letter, Number ) VALUES ( 'A', 1 ), ( 'B', 2 );

CREATE TABLE Cte2 ( Letter CHAR, Number INTEGER );
INSERT INTO Cte2 ( Letter, Number ) VALUES ( 'P', 1 ), ( 'Q', 2 );   
  

Example with SELECT

WITH
Name1 AS ( SELECT * FROM Cte1 ),
Name2 AS ( SELECT * FROM Cte2 )
SELECT * FROM Name1
UNION
SELECT * FROM
Name2;

Example with INSERT

WITH
Name1 AS ( SELECT * FROM Cte1 ),
Name2 AS ( SELECT * FROM Cte2 )
INSERT INTO Cte2
SELECT * FROM
Name1
EXCEPT
SELECT * FROM
Name2;
Because two tables don't have equal rows,
EXCEPT will return all the rows from the Cte1 table.

We will use CTE to insert those rows into Cte2.

Example with DELETE

WITH
Name1 AS ( SELECT Letter FROM Cte1 )
DELETE FROM Cte2
WHERE Letter   
      IN ( SELECT * FROM
Name1 );                                    
When we use DELETE, we always have to delete from the table, "DELETE FROM Cte2". It is not possible to create a CTE, and then to delete from that CTE, like "DELETE FROM Name1", expecting that server will delete from the underlining table. That means that we only can use CTE in the WHERE clause.

After deleting A1 and B2 from Cte2, our Cte2 table is back to its original state.

Example with UPDATE

WITH
Name1 AS ( SELECT Number FROM Cte1 )
UPDATE Cte2
SET Letter = 'X'
WHERE Number   
   IN ( SELECT * FROM
Name1 );
We will update all the letters in the table Cte2 where numbers are common for the both tables.
Because both tables have numbers 1 and 2, that means that all the letters in the table Cte2 will be updated to "X".

Again, CTE can only be used in the WHERE clause.

Example with MERGE

WITH
Name2 AS ( SELECT * FROM Cte2 )

MERGE INTO Cte1
USING
Name2
ON
Cte1.Letter = Name2.Letter
WHEN NOT MATCHED THEN INSERT ( Letter, Number )                                             VALUES ( Name2.Letter, Name2.Number );  
In MERGE statement, we can use CTE in the USING clause. Values from that CTE can be used to change values in the database table.

Our tables, Cte1 and Cte2 don't have common letters, so we don't have matches. That means that all the rows from our Cte2 table will be inserted into Cte1.

Aliases

We can use CTE without providing aliases to its columns.

WITH
Name1 AS ( SELECT * FROM Cte1 )
SELECT * FROM Name1;
If we want to, we can provide aliases in the SELECT statement. It is also possible to provide aliases after the name of the CTE. If both are provided, then the outside aliases will prevail ( Column1 and Column2, and not colLetter and colNumber ).

WITH
Name1 ( Column1, Column2 ) AS
( SELECT Letter AS colLetter, Number AS colNumber FROM Cte1 )
SELECT * FROM Name1;

Nested CTEs

WITH
Name1 AS ( SELECT * FROM Cte1 ),
Name2 AS ( SELECT * FROM Name1 )
SELECT * FROM Name2;
It is possible to reference one CTE from another CTE (Name2 is calling Name1 ).
WITH
Name1 AS ( SELECT * FROM Name2 ),
Name2 AS ( SELECT * FROM Cte2 )
SELECT * FROM Name2;
With nesting, we must be sure that referenced CTE is already defined. If we reference some
of the latter CTEs, then those CTEs will not be recognize ( Name1 cannot call Name2,
because it is not already defined ).

ORDER BY

In MonetDB, it is possible to use ORDER BY inside of CTEs definitions. That sorting will propagate to the final result.

WITH
Name1 AS ( SELECT * FROM Cte1 ORDER BY Letter DESC ),
Name2 AS ( SELECT * FROM Cte2 ORDER BY Number DESC )
SELECT * FROM Name1
UNION ALL
SELECT * FROM
Name2;

Recursive CTEs

Recursive CTEs are not supported in MonetDB database. This is done deliberately because of performance concerns.

0180 SQL Merge in MonetDB

Introduction

Merge is used when we want to use records from one table to decide what records to modify in another table. Let's assume that we want to keep two tables synchronized. In that case, every change on the first table should be reflected on the second table.

For full reflection, all of the updates, deletes or inserts that are done on the A table should also be done on the B table.

For partial reflection, we can do just the part of the full reflection. For example, new rows from the A table will also be added to the B table. Records updated or deleted in the table A, will not be updated or deleted in the table B.

MERGE statement cannot synchronize updates, deletes and inserts at the same time. Merge can synchronize ( INSERT and UPDATE ), or ( INSERT and DELETE ). Image bellow give us an explanation for INSERT and UPDATE. MERGE will create left outer join between tables. It will match rows based on our condition. Merge will then add and update records in the table B, so that table B is the same as the table A.

For INSERT and DELETE, we do the similar thing. New rows will be added to the table B, but matched rows will be deleted from the table B.

Sample Tables

We will create two tables. We will try to propagate all the changes in the table A to the table B. For start, we will enter only one row in the table A.

CREATE TABLE A ( Letter CHAR, Number INTEGER );
INSERT INTO A ( Letter, Number ) VALUES ( 'A', 1 );

CREATE TABLE B ( Letter CHAR, Number INTEGER );

Initial Synchronization

All the rows from the table A that do not exist in table B will be inserted into table B.

MERGE INTO B
     USING A
        ON B.Letter = A.Letter
      WHEN NOT MATCHED THEN INSERT ( Letter, Number ) VALUES ( A.Letter, A.Number );
MERGE INTO targetTable
USING
sourceTable
ON
matchingCondition
WHEN NOT MATCHED
THEN INSERT RECORDS

Now, table B, has one row, the same as table A.

Merge with INSERT

First, we will insert another row into the table A.
INSERT INTO A ( Letter, Number ) VALUES ( 'B', 2 );

Then we will repeat the same MERGE statement as above. This new row will then appear in the table B.

MERGE INTO B
     USING A
        ON B.Letter = A.Letter
      WHEN NOT MATCHED THEN INSERT ( Letter, Number )
                           VALUES ( A.Letter, A.Number );

Merge with UPDATE

First, we will update the table A. We will change A1 to A4.

UPDATE A SET Number = 4 WHERE Letter = 'A';

Then we'll push that change to the table B. Notice that we don't use 'WHEN NOT MATCHED'. Now we use 'WHEN MATCHED'.

MERGE INTO B     
     USING A      
  
          ON B.Letter = A.Letter and A.Letter = 'A'
      WHEN MATCHED THEN UPDATE SET Number = A.Number;


Merge with DELETE

We will update one record in the table A.

UPDATE A SET Number = 8 WHERE Letter = 'B';

Now, we will delete all the rows from the table B, that do exist in the table A. Row A4 exists in both tables, so that row will be deleted.

MERGE INTO B
     USING A
        ON B.Letter = A.Letter and B.Number = A.Number
      WHEN MATCHED THEN DELETE;

Merge with INSERT and UPDATE

The real reason for existence of MERGE statement is because we can do two things in one statement. This time we will do INSERT and UPDATE at the same time.

MERGE INTO B     
     USING A        
        ON B.Letter = A.Letter      
      WHEN NOT MATCHED THEN INSERT ( Letter, Number )           
                 
                             VALUES ( A.Letter, A.Number )      
      WHEN MATCHED THEN UPDATE SET Letter = A.Letter,
                                   Number = A.Number;

Merge with INSERT and DELETE

Currently, tables A and B are equal. We will add a row to table A to show how to use merge with INSERT and DELETE.

INSERT INTO A ( Letter, Number ) VALUES ( 'E', 5 );
MERGE INTO B
     USING A        
        ON B.Letter = A.Letter
      WHEN NOT MATCHED THEN INSERT ( Letter, Number )
                            VALUES ( A.Letter, A.Number )
      WHEN MATCHED THEN DELETE;  

Things That Don't Work in Merge

INSERT with Subquery

Subqueries are not supported in INSERT clause, inside of Merge statement.
MERGE INTO B
          USING A
                ON B.Letter = A.Letter
            WHEN NOT MATCHED THEN INSERT SELECT * FROM A;

Several MATCHED, or NOT MATCHED sentences

We can only use one MATCHED clause and/or one NOT MATCHED clause.
MERGE INTO B
     USING A
        ON B.Letter = A.Letter
      WHEN MATCHED THEN DELETE
      WHEN MATCHED THEN UPDATE SET Letter = A.Letter,
                                   Number = A.Number;

Multiple matches

I will add row "E6" in the table "A" with a statement "INSERT INTO A VALUES ( 'E', 6 )". Now this table has rows E5 and E6. If we now apply MERGE statement, both rows E5 and E6 will try to update row E5 in the table B. This is not allowed and will fail.

MERGE INTO B USING A ON B.Letter = A.Letter
WHEN MATCHED THEN UPDATE SET Letter = A.Letter, Number = A.Number;

0170 Sampling, Analyze, Prepare in MonetDB

Let's create test table:

1) CREATE TABLE samplingTable ( Letter CHAR, Number INTEGER );

2) INSERT INTO samplingTable ( Letter, Number )
  VALUES ( 'A', 1 ), ( 'B', 2 ), ( 'C', 3 ), ( 'D', 4 ), ( 'E', 5 )

, ( 'F', 6 ), ( 'G', 7 ), ( 'H', 8 ), ( 'I', 9 ), ( 'J', 10 );

Two Ways of Sampling

Sampling of records is done with SAMPLE clause. This clause accepts continuous range [0,1) as an argument. Selected number is percent of records that will become part of the sample. It is also possible to select any integer bigger than zero. That number will become number of units in the sample.

SELECT * FROM samplingTable SAMPLE 0.2;  
 –20%       
SELECT * FROM samplingTable SAMPLE 3;  
–3 units
SELECT * FROM samplingTable SAMPLE 1;       
–1 unit
It is possible to sample from a subquery.
SELECT * FROM
( SELECT * FROM samplingTable SAMPLE 0.1 ) subqueryAlias;

It is possible to use sample in an INSERT statement.
INSERT INTO samplingTable
SELECT * FROM samplingTable SAMPLE 2;

Samples have uniform distribution.

Table Statistics

Statistics System Table

Statistics about tables is placed in the table sys.statistics. In order to read from this system table, we have to log in as a member of monetdb group.
First, we quit current session (1). Then we have to log in as a monetdb group member (2). Password is "monetdb" (3). Then, we are allowed to read from statistics table (4).

These are the columns in that table:

column_idschematablecolumntypewidthcountuniquenilsminvalmaxvalsortedrevsorted
INTEGERVARCHARVARCHARVARCHARVARCHARINTEGERBIGINTBOOLEANBOOLEANCLOBCLOBBOOLEANBOOLEAN
Internal
ID of column.
Schema name.Table name.Column name.Data
type of column.
Number of bytes.Number
or rows, approximate.
Are values unique in column.True, if there are nulls in column.Minimum in the column or null.Maximum in the column or null.Column sorted ascending.Column sorted descending.

All the strings from this table are small case. When we refer to them, we have to write them exactly like in the table.

SELECT * FROM sys.statistics WHERE COLUMN = 'Letter';This will fail because we have to use lower letters for 'letter'.

Table Producing Functions

Table sys.statistics will give us statistics about all of the user tables in our schema. If we want to read statistics of system tables too, then we should use this table producing function.

SELECT * FROM
sys.statistics();

For reading statistics for one schema, table, or column, we can filter sys.statistics or sys.statistics() tables. Instead of filtering, we should use these table producing functions with arguments which define what data we want to read. This table producing functions are fastest way to get specific statistics.

SELECT * FROM sys.statistics('voc');
Statistics for a schema.
SELECT * FROM
sys.statistics( 'voc','samplingtable');
Arguments are case sensitive. Statistics for a table.
SELECT FROM
sys.statistics('voc','samplingtable','letter');
Statistics for a column.

Refreshing Statistics

We can refresh statistics for the whole schema, for the one table, or for the one column. The refresh will take a long time if we have a lot of data

ANALYZE voc;  –whole schemaANALYZE voc.samplingtable;     –one tableANALYZE voc.samplingtable ( letter );    –one column

Statistics is used by MonetDB, to optimize queries. If we made a lot of deletes, inserts or updates, statistics will become obsolete so we should refresh it. This will make our queries performant again.

Prepare Statement

If we want to use some statement many times, we can prepare it as well. That means that executing plan for that statement will be cached during the session. Each sequential call of that statement will use cached version, but with different parameters.

As a monetdb user, our default schema is sys. Before going through example, we will first change our current schema to voc.

Now, instead of voc.samplingTable, we can just type samplingTable.

For example, let's say that we want to insert three rows in our table samplingTable. For that we can use three separate statements.

INSERT INTO samplingTable ( Letter, Number ) VALUES ( 'O', 100 );
INSERT INTO samplingTable ( Letter, Number ) VALUES ( 'P', 101 );
INSERT INTO samplingTable ( Letter, Number ) VALUES ( 'Q', 102 );

For each of these statements, MonetDB will create separate execution plan. To avoid doing the same work three times, we will prepare this statement.

Instead of parameters, we will use question marks. They are placeholders for real values.
PREPARE INSERT INTO samplingTable ( Letter, Number ) VALUES ( ?, ? );

Zero, from the image on the right, is an ID of our prepared statement. We can use that ID to call our prepared statement.

We will now execute our three statements from above, by using this prepared statement. Zero is ID of our prepared statement.

EXECUTE 0( '0', 100);
EXECUTE 0( 'P', 101);
EXECUTE 0( 'Q', 102);
This will insert three new rows in samplingTable.
Data about our prepared statements is in the system table sys.prepared_statements.

We can release this prepared statement from our session cache.

DEALLOCATE PREPARE 0 --This will remove just prepared statement with ID 0.
DEALLOCATE PREPARE ALL --This will remove all prepared statements.

Prepared statements will be deleted automatically, if any error happens during the user session.

0160 Subqueries in MonetDB

First, we will create sample tables.

1) CREATE TABLE outerQuery ( Letter CHAR, Number TINYINT );
2) INSERT INTO outerQuery ( Letter, Number )
  VALUES ( 'A', 1 ), ( 'B', 2 );
1) CREATE TABLE innerQuery ( Letter VARCHAR(1), Number SMALLINT );
2) INSERT INTO innerQuery ( Letter, Number )
  VALUES ( 'B', 2 ), ( 'C', 3 );

The Definition of Subquery

Subquery is SELECT statement nested within some other SQL statement. Subquery can be nested within an UPDATE, DELETE, or other SELECT statement.

We can use subquery in three ways.
1) As a source table. Here we use a subquery in the FROM clause.
2) As an expression that will become a new column. Here we use a subquery in the SELECT clause.
3) As an operand in Boolean expressions. This is when we use subquery in WHERE or HAVING clauses.

The SELECT Statement

SELECT statement is special. Only in the SELECT statement we can subquery as a source table, or an expression column.

Subquery as a Source Table

We can read our data from subquery. In this case subquery replaces the source table.
 
SELECT * FROM innerQuey;   
SELECT * FROM ( SELECT * FROM innerQuery ) subqueryAlias

 
The distinguishing element of each subquery is parentheses. They are always mandatory.
If we are using subquery as the source table, then we must also use the alias
SELECT *
FROM outerQuery
INNER JOIN ( SELECT * FROM innerQuery ) subqueryAlias
    ON outerQuery.Letter = SubqueryAlias.Letter;
It is possible to use a
subquery within a
JOIN clause.
SELECT * FROM
( SELECT * FROM outerQuery
UNION
SELECT * FROM innerQuery ) subqueryAlias;
We can also create a union between two
subqueries and then use that union in the
FROM clause. This means that we can
combine several subqueries into one, using set operators.

Subquery as a Column Expression

If a subquery returns a scalar, we can use that value directly as a column value.

SELECT 3, ( SELECT 5 + 5 );SELECT *, ( SELECT 5 + 2 ) columnAlias
FROM outerQuery;

Subquery as an Operand of a Boolean Expression

In SQL expressions we can work with scalars and with tables. These are the two data structures that exist in SQL.

SELECT '7';This will return the scalar.SELECT * FROM Table1;This will return the table.

Once we get the result of a SELECT statement, we can use that result as an operand in Boolean expression. We need operators for expressions. Let's see what kind of operators we can use.

Comparison Operators

If the SELECT query returns a scalar, we can use that value with the comparison operators "=, <>, >, >=".

SELECT ( 2 = ( SELECT Number FROM innerQuery WHERE Letter = 'B' ) ) AS Result;
This is the same as the statement:
SELECT ( 2 = 2 ) AS Result;
The inner query will return the number 2, which we will use in our expression.

For all operators we can negate the result with the NOT prefix.

SELECT ( NOT 2 = ( SELECT Number FROM innerQuery WHERE Letter = 'B' ) ) AS Result;  
This prefix will transform a "true" result into a "false" result.

IN Operator

The IN operator will check if a value belongs to a set. We can negate IN with NOT IN.

SELECT 2 IN ( SELECT Number FROM innerQuery ) AS Result;
The statement above is similar to the statements below. They all would get the same result.
SELECT 2 IN ( 2, 2 ) AS Result;
SELECT 2 IN ( VALUES (2),(2) ) AS Result;  
The statement "VALUES (2), (2)" will return a column. The statement "VALUES ( 2, 2 )" will return a row.

When we use the IN operator, we always have to use only one column. In some databases this statement below will work but this will not work in MonetDB. We can only use one column with the IN operator.
SELECT ( 2, 2 ) IN ( VALUES ( 2, 2 ) );

ALL and ANY

Comparison operators can be made more powerful by combining them with the ALL or ANY operators.

The "Number" column from the innerQuery table has the values 2 and 3.
In the example (1), we check whether ( 1 < 2 ) AND ( 1 < 3 ). The result is TRUE.
SELECT 1 < ALL ( SELECT Number FROM innerQuery ) AS Result;

In example (2), we check whether the number 3 is smaller than all the numbers from the innerQuery table, ( 3 < 2 ) AND ( 3 < 3 ). It is not, this expression returns false.
SELECT 3 < ALL ( SELECT Number FROM innerQuery ) AS Result;

ANY is less demanding than ALL. With ANY, we ask "is there some number in the innerQuery table that will make our comparison TRUE?".

SELECT 2 < ANY ( SELECT Number FROM innerQuery ) AS Result;
ANY is using OR logic, ( 2 < 2 ) OR ( 2 < 3 ).

SELECT 2 < ALL ( SELECT Number FROM innerQuery ) AS Result;
ALL is using AND logic ( 2 < 2 ) AND ( 2 < 3 ).

EXISTS, or NOT EXIST

Sometimes we just want to test whether a query is empty or not. We use the EXISTS operator for that.

Our table does not have the letter 'Z', so the subquery will be empty. That's why the EXISTS operator returns FALSE.
SELECT EXISTS ( SELECT * FROM innerQuery WHERE Letter = 'Z' ) AS Result;


Conclusion

1) We can only compare compatible data types. We cannot compare apples and pears (numbers and strings).
2) The result of all expressions involving a subquery is Boolean.
3) When using subquery operators, the subquery can be on both the left and right sides. Some databases do not allow subqueries to be on the left. MonetDB is not like that.


This subquery is on the left side of operator.
It is custom to place subquery on the right side, and I will follow that convention.

Subquery in WHERE or HAVING clauses

Now that we know how to use subquery in a Boolean expression, we can use it in WHERE or HAVING clauses. We can use HAVING in SELECT statement, but WHERE can be used in SELECT, UPDATE or DELETE statements. Let's look at some examples:

SELECT Letter, SUM( Number ) AS Total FROM
( SELECT * FROM outerQuery
UNION ALL
SELECT * FROM innerQuery ) subqueryAlias GROUP BY Letter;

We will create a union of our two tables to group them together.    
This SELECT statement will be used as an operand. It will return a result of 2.5.

SELECT AVG( Number ) FROM innerQuery;
 
 
SELECT Letter, SUM( Number ) AS Total FROM
( SELECT * FROM outerQuery
UNION ALL
SELECT * FROM innerQuery ) subqueryAlias
GROUP BY Letter
HAVING SUM( Number ) > ( SELECT AVG( Number )                          FROM innerQuery );
We will combine the two
statements above into
one. We will filter only
those groups in which the
SUM(number) values are greater than 2.5.

These are two examples with WHERE.

UPDATE outerQuery
SET Number = 5
WHERE Number IN
( SELECT Number
FROM innerQuery
);
The subquery will return the numbers 2 and 3.
Therefore, the number 2 from the outer query
will be updated to the number 5.

We can return that number using the following statement.

UPDATE outerQuery
SET Number = 2
WHERE Number > ALL 
( SELECT Number
FROM innerQuery
);
The number 5 is now greater than all
numbers from innerQuery. So we're
going to move it back to number 2.

Now it's easy to understand why I named my tables outerQuery and innerQuery. We used outerQuery for the main query, and innerQuery table for the subquery.

Correlated Subquery

A correlated subquery is a special type of subquery. It is extremely powerful, but its performance is dismal. In real life, where we have large tables, using correlated subqueries will kill performance. We can still use it successfully on small tables.

SELECT oq.Letter, oq.Number,
( SELECT iq.Number
FROM innerQuery
iq
WHERE
iq.Number = oq.Number ) AS siblingNumber
FROM outerQuery oq 
In a correlated subquery, we execute the subquery once for each
row of the outer query. The subquery uses the value from the outer
query as its parameter. This can be better understood in the table below.

For each row, we execute the inner query once. For the WHERE parameter of the inner query, we take the value from the outer query. This is how a correlated subquery works.

outerQuery.LetterouterQuery.NumbersiblingNumber
A1SELECT Number FROM innerQuery WHERE Number = 1=> Returns null
B2SELECT Number FROM innerQuery WHERE Number = 2=> Returns 2

Correlated subquery can be used in SELECT, HAVING, WHERE clauses. A subquery is a correlated subquery if it uses a value from an outer query as a parameter.

Order By

In MonetDB, subqueries can use the ORDER BY clause.

SELECT * FROM
( SELECT * FROM innerQuery
ORDER BY Number
) Alias;
In some other databases this is only
allowed if the subquery uses LIMIT
or OFFSET specifiers.