## 0170 Sampling, Analyze, Prepare in MonetDB

Let's create test table:

1) `CREATE TABLE samplingTable ( Letter CHAR, Number INTEGER );`

2) `INSERT INTO samplingTable ( Letter, Number )  VALUES ( 'A', 1 ), ( 'B', 2 ), ( 'C', 3 ), ( 'D', 4 ), ( 'E', 5 )`
` , ( 'F', 6 ), ( 'G', 7 ), ( 'H', 8 ), ( 'I', 9 ), ( 'J', 10 );`

## Two Ways of Sampling

Sampling of records is done with SAMPLE clause. This clause accepts continuous range [0,1) as an argument. Selected number is percent of records that will become part of the sample. It is also possible to select any integer bigger than zero. That number will become number of units in the sample.

Samples have uniform distribution.

## Table Statistics

### Statistics System Table

Statistics about tables is placed in the table sys.statistics. In order to read from this system table, we have to log in as a member of monetdb group.
First, we quit current session (1). Then we have to log in as a monetdb group member (2). Password is "monetdb" (3). Then, we are allowed to read from statistics table (4).

These are the columns in that table:

All the strings from this table are small case. When we refer to them, we have to write them exactly like in the table.

### Table Producing Functions

Table sys.statistics will give us statistics about all of the user tables in our schema. If we want to read statistics of system tables too, then we should use this table producing function.

For reading statistics for one schema, table, or column, we can filter sys.statistics or sys.statistics() tables. Instead of filtering, we should use these table producing functions with arguments which define what data we want to read. This table producing functions are fastest way to get specific statistics.

### Refreshing Statistics

We can refresh statistics for the whole schema, for the one table, or for the one column. The refresh will take a long time if we have a lot of data

Statistics is used by MonetDB, to optimize queries. If we made a lot of deletes, inserts or updates, statistics will become obsolete so we should refresh it. This will make our queries performant again.

## Prepare Statement

If we want to use some statement many times, we can prepare it as well. That means that executing plan for that statement will be cached during the session. Each sequential call of that statement will use cached version, but with different parameters.

As a monetdb user, our default schema is sys. Before going through example, we will first change our current schema to voc.

For example, let's say that we want to insert three rows in our table samplingTable. For that we can use three separate statements.

`INSERT INTO samplingTable ( Letter, Number ) VALUES ( 'O', 100 );INSERT INTO samplingTable ( Letter, Number ) VALUES ( 'P', 101 );INSERT INTO samplingTable ( Letter, Number ) VALUES ( 'Q', 102 );`

For each of these statements, MonetDB will create separate execution plan. To avoid doing the same work three times, we will prepare this statement.

We will now execute our three statements from above, by using this prepared statement. Zero is ID of our prepared statement.

We can release this prepared statement from our session cache.

Prepared statements will be deleted automatically, if any error happens during the user session.

## 0160 Subqueries in MonetDB

First, we will create sample tables.

## The Definition of Subquery

Subquery is SELECT statement nested within some other SQL statement. Subquery can be nested within an UPDATE, DELETE, or other SELECT statement.

We can use subquery in three ways.
1) As a source table. Here we use a subquery in the FROM clause.
2) As an expression that will become a new column. Here we use a subquery in the SELECT clause.
3) As an operand in Boolean expressions. This is when we use subquery in WHERE or HAVING clauses.

## The SELECT Statement

SELECT statement is special. Only in the SELECT statement we can subquery as a source table, or an expression column.

### Subquery as a Column Expression

If a subquery returns a scalar, we can use that value directly as a column value.

## Subquery as an Operand of a Boolean Expression

In SQL expressions we can work with scalars and with tables. These are the two data structures that exist in SQL.

Once we get the result of a SELECT statement, we can use that result as an operand in Boolean expression. We need operators for expressions. Let's see what kind of operators we can use.

### Comparison Operators

If the SELECT query returns a scalar, we can use that value with the comparison operators "=, <>, >, >=".

For all operators we can negate the result with the NOT prefix.

### IN Operator

The IN operator will check if a value belongs to a set. We can negate IN with NOT IN.

### ALL and ANY

Comparison operators can be made more powerful by combining them with the ALL or ANY operators.

ANY is less demanding than ALL. With ANY, we ask "is there some number in the innerQuery table that will make our comparison TRUE?".

### EXISTS, or NOT EXIST

Sometimes we just want to test whether a query is empty or not. We use the EXISTS operator for that.

### Conclusion

1) We can only compare compatible data types. We cannot compare apples and pears (numbers and strings).
2) The result of all expressions involving a subquery is Boolean.
3) When using subquery operators, the subquery can be on both the left and right sides. Some databases do not allow subqueries to be on the left. MonetDB is not like that.

## Subquery in WHERE or HAVING clauses

Now that we know how to use subquery in a Boolean expression, we can use it in WHERE or HAVING clauses. We can use HAVING in SELECT statement, but WHERE can be used in SELECT, UPDATE or DELETE statements. Let's look at some examples:

These are two examples with WHERE.

We can return that number using the following statement.

Now it's easy to understand why I named my tables outerQuery and innerQuery. We used outerQuery for the main query, and innerQuery table for the subquery.

## Correlated Subquery

A correlated subquery is a special type of subquery. It is extremely powerful, but its performance is dismal. In real life, where we have large tables, using correlated subqueries will kill performance. We can still use it successfully on small tables.

For each row, we execute the inner query once. For the WHERE parameter of the inner query, we take the value from the outer query. This is how a correlated subquery works.

Correlated subquery can be used in SELECT, HAVING, WHERE clauses. A subquery is a correlated subquery if it uses a value from an outer query as a parameter.

## Order By

In MonetDB, subqueries can use the ORDER BY clause.

## Where VS Having

Where and Having are two clauses that we will best explain using examples. This time we will create only one table.

## Having

Let's combine these two clauses to better understand them.

So, first we have to use WHERE. WHERE is always used before grouping, because it works on the detail rows. After this comes grouping. When we get our groups then is the time to pick only some of them by using HAVING condition.

## INSERT

INSERT statement is used to enter data into table. We will insert some more rows to table WhereOrHaving. Basic way of using INSERT is to make a list of values. These values will be placed in the table as a new row. Our table WhereOrHaving has two columns of CHAR and TINYINT types, so we should be careful to insert data of that data types.

Sometimes we want to be specific into which columns we want to enter values. Now we can provide values in any order we want.

If we use syntax where some of the columns are specified, but other are not, then all other columns will be filled with nulls.

If we enter value for Number, and not for Letter, then the DEFAULT value will be used for the Letter.

## INSERT With SELECT

We will take small sample from the table above.

We can now add this sample into any other table. Presumption is that destination table should have the same structure as our sample. For simplicity we will not add this sample to same other table, but we will add it back to our WhereOrHaving table. Then, our table will have two ( 'D', 77 ) rows.

## UPDATE

We saw that we now have two rows in WhereOrHaving table with nulls. We have "M, null" and "G, null".

We can update values "M, null" to "Q, null".

DEFAULT value for Letter columns is "D". We can set any field in Letter column to this default value.

## Built-In Variables

MonetDB has a collection of built-in variables. Those variables can provide us with valuable information about our current session and current time.

SELECT CURRENT_SCHEMA AS CurrentSchema, CURRENT_USER AS CurrentUser, CURRENT_DATE AS CurrentDate
, CURRENT_TIME AS CurrentTime, CURRENT_TIMESTAMP AS CurrentTimestamp, CURRENT_TIMEZONE AS CurrentTimezone
, LOCALTIME AS LocTime, LOCALTIMESTAMP AS LocTimeStamp;

## SET OPERATORS

Let's say we have a purple and a green data set. The data set is the result of a query. Using the SET operators, we can treat the rows in those two data sets as mathematical sets.

## Sample tables

We will create two sample tables. Notice that columns in these tables are using compatible data types ( CHAR vs VARCHAR(1), TINYINT vs SMALLINT ).

## INTERSECT and EXCEPT with duplicates

If we can use ALL with UNION, is it possible to use it with INTERSECT and EXCEPT? It is possible. This way, all of the duplicates will remain.

## Connecting More Than 2 Tables with Set Operators

### Two Conclusions

1) SET operators don't have priority among them. They are applied from the top to the bottom. This is different than in some other databases where INTERSECTION has the top priority. This would be the result if we enforce priority of an INTERSECT operator.

2) SET operators without ALL specifiers will first remove duplicates from their operands. SET operators with ALL specifiers will leave their operands untouched. This is true not only for EXCEPT (like in previous examples), but also for UNION and INTERSECT. The message here is that removal of the duplicates is not conducted on the result, but on the operands, and after that SET operators will apply their logic.

## Corresponding

THIS IS NOT WORKING IN MONETDB, although documentation claims that it works.

## Sample tables

We will create these two tables in the MonetDB database:

We can create these two tables using these statements. The first statement will create the table. The second statement will populate that table with data. These are the same tables we have used in the previous blog post, so maybe you already have them in your database.

## Diagram

Diagram shows two tables that we want to join. There are three basic ways how we can join tables, INNER join, LEFT join, and FULL OUTER join.

INNER JOIN means only matching rows will remain after join. In LEFT JOIN, we will not exclude any row from the left table and such rows will be matched with null values in the right table. A FULL OUTER JOIN means that no one will be left out. All orphan rows will be padded using nulls.

## INNER JOIN

Let's filter the CROSSJOIN table so that only the rows where the Emploiees.EmploieeID column is equal to the Sales.EmploieeID column will remain.

## FULL OUTER JOIN

I will add another Sale to the Sales table. We'll pretend that the former employee made that sale.
`INSERT INTO SALES ( SalesID, DateCol, EmployeeID, Product, Qty, Price ) VALUES ( 0, '2024-03-04', 0, 'Q', 15, 3 );`

## Matching condition and aliases

We can notice that an overly complex SELECT statement can lead to hard-to-read code. This is a case where we can use aliases wisely. In practice, we usually use short aliases, consisting of only one letter, such as "a" and "b".

## Join with more than two tables

We will add one more table to our database in order to explain JOIN with more than two tables.

The tables are connected in a chain manner. We can join as many tables as we want.

## Special syntaxes

These special syntaxes are not often used. These syntaxes are based on the fact that tables are usually joined by columns with the same name and data type.

### NATURAL JOIN

A NATURAL JOIN is an INNER JOIN between two tables. The condition is based on equality between columns that have the same name and the same data type. Let's say we have an INNER JOIN between the Emploees and Sales tables.

### USING Clause

The USING clause also assumes that we have some columns in both tables that have the same name and are of the same type. With the USING clause, we can specify any type of join (INNER, LEFT, LEFT OUTER). Let's make a left join between the tables Employees and Sales.