bizkapish

0170 Sampling, Analyze, Prepare in MonetDB

MonetDB / By Bizkapish / May 7, 2024 May 8, 2024

Let's create test table:

1) CREATE TABLE samplingTable ( Letter CHAR, Number INTEGER );

2) INSERT INTO samplingTable ( Letter, Number ) VALUES ( 'A', 1 ), ( 'B', 2 ), ( 'C', 3 ), ( 'D', 4 ), ( 'E', 5 )
, ( 'F', 6 ), ( 'G', 7 ), ( 'H', 8 ), ( 'I', 9 ), ( 'J', 10 );

Two Ways of Sampling

Sampling of records is done with SAMPLE clause. This clause accepts continuous range [0,1) as an argument. Selected number is percent of records that will become part of the sample. It is also possible to select any integer bigger than zero. That number will become number of units in the sample.

Samples have uniform distribution.

Table Statistics

Statistics System Table

Statistics about tables is placed in the table sys.statistics. In order to read from this system table, we have to log in as a member of monetdb group.
First, we quit current session (1). Then we have to log in as a monetdb group member (2). Password is "monetdb" (3). Then, we are allowed to read from statistics table (4).

These are the columns in that table:

column_id

schema

table

column

type

width

count

unique

nils

minval

maxval

sorted

revsorted

INTEGER

VARCHAR

INTEGER

BIGINT

BOOLEAN

CLOB

BOOLEAN

Internal
ID of column.

Schema name.

Table name.

Column name.

Data
type of column.

Number of bytes.

Number
or rows, approximate.

Are values unique in column.

True, if there are nulls in column.

Minimum in the column or null.

Maximum in the column or null.

Column sorted ascending.

Column sorted descending.

All the strings from this table are small case. When we refer to them, we have to write them exactly like in the table.

SELECT * FROM sys.statistics WHERE COLUMN = 'Letter'; This will fail because we have to use lower letters for 'letter'.

Table Producing Functions

Table sys.statistics will give us statistics about all of the user tables in our schema. If we want to read statistics of system tables too, then we should use this table producing function.

For reading statistics for one schema, table, or column, we can filter sys.statistics or sys.statistics() tables. Instead of filtering, we should use these table producing functions with arguments which define what data we want to read. This table producing functions are fastest way to get specific statistics.

SELECT * FROM sys.statistics('voc');
Statistics for a schema. SELECT * FROM
sys.statistics( 'voc','samplingtable');
Arguments are case sensitive. Statistics for a table. SELECT FROM
sys.statistics('voc','samplingtable','letter');
Statistics for a column.

Refreshing Statistics

We can refresh statistics for the whole schema, for the one table, or for the one column. The refresh will take a long time if we have a lot of data

ANALYZE voc; –whole schema ANALYZE voc.samplingtable; –one table ANALYZE voc.samplingtable ( letter ); –one column

Statistics is used by MonetDB, to optimize queries. If we made a lot of deletes, inserts or updates, statistics will become obsolete so we should refresh it. This will make our queries performant again.

Prepare Statement

If we want to use some statement many times, we can prepare it as well. That means that executing plan for that statement will be cached during the session. Each sequential call of that statement will use cached version, but with different parameters.

As a monetdb user, our default schema is sys. Before going through example, we will first change our current schema to voc.

For example, let's say that we want to insert three rows in our table samplingTable. For that we can use three separate statements.

INSERT INTO samplingTable ( Letter, Number ) VALUES ( 'O', 100 ); INSERT INTO samplingTable ( Letter, Number ) VALUES ( 'P', 101 ); INSERT INTO samplingTable ( Letter, Number ) VALUES ( 'Q', 102 );

For each of these statements, MonetDB will create separate execution plan. To avoid doing the same work three times, we will prepare this statement.

We will now execute our three statements from above, by using this prepared statement. Zero is ID of our prepared statement.

We can release this prepared statement from our session cache.

DEALLOCATE PREPARE 0 --This will remove just prepared statement with ID 0. DEALLOCATE PREPARE ALL --This will remove all prepared statements.

Prepared statements will be deleted automatically, if any error happens during the user session.

0160 Subqueries in MonetDB

MonetDB / By Bizkapish / April 24, 2024 April 28, 2024

First, we will create sample tables.

1) CREATE TABLE outerQuery ( Letter CHAR, Number TINYINT );
2) INSERT INTO outerQuery ( Letter, Number )
  VALUES ( 'A', 1 ), ( 'B', 2 );

1) CREATE TABLE innerQuery ( Letter VARCHAR(1), Number SMALLINT );
2) INSERT INTO innerQuery ( Letter, Number )
  VALUES ( 'B', 2 ), ( 'C', 3 );

The Definition of Subquery

Subquery is SELECT statement nested within some other SQL statement. Subquery can be nested within an UPDATE, DELETE, or other SELECT statement.

We can use subquery in three ways.
1) As a source table. Here we use a subquery in the FROM clause.
2) As an expression that will become a new column. Here we use a subquery in the SELECT clause.
3) As an operand in Boolean expressions. This is when we use subquery in WHERE or HAVING clauses.

The SELECT Statement

SELECT statement is special. Only in the SELECT statement we can subquery as a source table, or an expression column.

Subquery as a Source Table

Subquery as a Column Expression

If a subquery returns a scalar, we can use that value directly as a column value.

Subquery as an Operand of a Boolean Expression

In SQL expressions we can work with scalars and with tables. These are the two data structures that exist in SQL.

SELECT '7'; This will return the scalar. SELECT * FROM Table1; This will return the table.

Once we get the result of a SELECT statement, we can use that result as an operand in Boolean expression. We need operators for expressions. Let's see what kind of operators we can use.

Comparison Operators

If the SELECT query returns a scalar, we can use that value with the comparison operators "=, <>, >, >=".

For all operators we can negate the result with the NOT prefix.

IN Operator

The IN operator will check if a value belongs to a set. We can negate IN with NOT IN.

ALL and ANY

Comparison operators can be made more powerful by combining them with the ALL or ANY operators.

ANY is less demanding than ALL. With ANY, we ask "is there some number in the innerQuery table that will make our comparison TRUE?".

EXISTS, or NOT EXIST

Sometimes we just want to test whether a query is empty or not. We use the EXISTS operator for that.

Conclusion

1) We can only compare compatible data types. We cannot compare apples and pears (numbers and strings).
2) The result of all expressions involving a subquery is Boolean.
3) When using subquery operators, the subquery can be on both the left and right sides. Some databases do not allow subqueries to be on the left. MonetDB is not like that.

Subquery in WHERE or HAVING clauses

Now that we know how to use subquery in a Boolean expression, we can use it in WHERE or HAVING clauses. We can use HAVING in SELECT statement, but WHERE can be used in SELECT, UPDATE or DELETE statements. Let's look at some examples:

These are two examples with WHERE.

We can return that number using the following statement.

Now it's easy to understand why I named my tables outerQuery and innerQuery. We used outerQuery for the main query, and innerQuery table for the subquery.

Correlated Subquery

A correlated subquery is a special type of subquery. It is extremely powerful, but its performance is dismal. In real life, where we have large tables, using correlated subqueries will kill performance. We can still use it successfully on small tables.

SELECT oq.Letter, oq.Number,
( SELECT iq.Number
FROM innerQuery iq
WHERE iq.Number = oq.Number) AS siblingNumber FROM outerQuery oq In a correlated subquery, we execute the subquery once for each
row of the outer query. The subquery uses the value from the outer
query as its parameter. This can be better understood in the table below.

For each row, we execute the inner query once. For the WHERE parameter of the inner query, we take the value from the outer query. This is how a correlated subquery works.

Correlated subquery can be used in SELECT, HAVING, WHERE clauses. A subquery is a correlated subquery if it uses a value from an outer query as a parameter.

Order By

In MonetDB, subqueries can use the ORDER BY clause.

0150 Having, Insert, Update, Delete and Built-in Variables

MonetDB / By Bizkapish / April 10, 2024 April 13, 2024

Where VS Having

Where and Having are two clauses that we will best explain using examples. This time we will create only one table.

Where

Having

Let's combine these two clauses to better understand them.

So, first we have to use WHERE. WHERE is always used before grouping, because it works on the detail rows. After this comes grouping. When we get our groups then is the time to pick only some of them by using HAVING condition.

INSERT

INSERT statement is used to enter data into table. We will insert some more rows to table WhereOrHaving. Basic way of using INSERT is to make a list of values. These values will be placed in the table as a new row. Our table WhereOrHaving has two columns of CHAR and TINYINT types, so we should be careful to insert data of that data types.

INSERT INTO WhereOrHaving VALUES ( 'E', 36 ); It is also possible to enter several rows at once.
INSERT INTO WhereOrHaving VALUES ( 'X', 36 ), ( 'Y', 46 ), ( 'Z', 56 );

IF we don't provide values for all the columns, insert will fail. `INSERT INTO WhereOrHaving VALUES ( 'M' );`
	We have to provide values for all the columns, even if some of values are nulls. `INSERT INTO WhereOrHaving VALUES ( 'M', null );`

Sometimes we want to be specific into which columns we want to enter values. Now we can provide values in any order we want.

INSERT INTO WhereOrHaving ( Number, Letter ) VALUES ( 13, 'P');
We will first provide value for Number and then for Letter.

If we use syntax where some of the columns are specified, but other are not, then all other columns will be filled with nulls.

INSERT INTO WhereOrHaving ( Letter ) VALUES ( 'G' );

If we enter value for Number, and not for Letter, then the DEFAULT value will be used for the Letter.

INSERT INTO WhereOrHaving ( Number ) VALUES ( 77 );

INSERT With SELECT

We will take small sample from the table above.

We can now add this sample into any other table. Presumption is that destination table should have the same structure as our sample. For simplicity we will not add this sample to same other table, but we will add it back to our WhereOrHaving table. Then, our table will have two ( 'D', 77 ) rows.

UPDATE

We saw that we now have two rows in WhereOrHaving table with nulls. We have "M, null" and "G, null".

UPDATE WhereOrHaving SET Number = 101 WHERE Letter = 'G'; We have updated "G, null" to "G, 101".

We can update values "M, null" to "Q, null".

UPDATE WhereOrHaving
SET Letter = 'Q'
WHERE Number IS null;

To find rows where column value is null, we use operator IS, and not =.

DEFAULT value for Letter columns is "D". We can set any field in Letter column to this default value.

UPDATE WhereOrHaving SET Letter = DEFAULT WHERE Number = 101; Now we have default letter "D" beside number 101.

DELETE

Built-In Variables

MonetDB has a collection of built-in variables. Those variables can provide us with valuable information about our current session and current time.

CURRENT_SCHEMA	CURRENT_DATE	CURRENT_TIMESTAMP	LOCALTIME
CURRENT_USER	CURRENT_TIME	CURRENT_TIMEZONE	LOCALTIMESTAMP

SELECT CURRENT_SCHEMA AS CurrentSchema, CURRENT_USER AS CurrentUser, CURRENT_DATE AS CurrentDate
, CURRENT_TIME AS CurrentTime, CURRENT_TIMESTAMP AS CurrentTimestamp, CURRENT_TIMEZONE AS CurrentTimezone
, LOCALTIME AS LocTime, LOCALTIMESTAMP AS LocTimeStamp;

0140 MonetDB – SET operators

MonetDB / By Bizkapish / April 5, 2024 April 24, 2024

SET OPERATORS

Let's say we have a purple and a green data set. The data set is the result of a query. Using the SET operators, we can treat the rows in those two data sets as mathematical sets.

Sample tables

We will create two sample tables. Notice that columns in these tables are using compatible data types ( CHAR vs VARCHAR(1), TINYINT vs SMALLINT ).

1) CREATE TABLE aboveTable ( Letter CHAR, Number TINYINT );
2) INSERT INTO aboveTable ( Letter, Number )
VALUES ( 'A', 1 ), ( 'A', 1 ),
( 'B', 2 ), ( 'B', 2 );

1) CREATE TABLE belowTable ( Letter VARCHAR(1), Number SMALLINT );
2) INSERT INTO belowTable ( Letter, Number )
VALUES ( 'B', 2 ), ( 'B', 2 ),
( 'C', 3 ), ( 'C', 3 );

UNION ALL

UNION

INTERSECT

EXCEPT

INTERSECT and EXCEPT with duplicates

If we can use ALL with UNION, is it possible to use it with INTERSECT and EXCEPT? It is possible. This way, all of the duplicates will remain.

ORDER BY and SET operators

Column names and SET operations

Connecting More Than 2 Tables with Set Operators

How EXCEPT works

How EXCEPT ALL works

EXCEPT ALL with DISTINCT

UNION AND INTERSECT

Two Conclusions

1) SET operators don't have priority among them. They are applied from the top to the bottom. This is different than in some other databases where INTERSECTION has the top priority. This would be the result if we enforce priority of an INTERSECT operator.

2) SET operators without ALL specifiers will first remove duplicates from their operands. SET operators with ALL specifiers will leave their operands untouched. This is true not only for EXCEPT (like in previous examples), but also for UNION and INTERSECT. The message here is that removal of the duplicates is not conducted on the result, but on the operands, and after that SET operators will apply their logic.

Corresponding

THIS IS NOT WORKING IN MONETDB, although documentation claims that it works.

CORRESPONDING clause can fix this problem. CORRESPONDING clause will correctly pair columns so that UNION is successful. Unfortunately, statement on the right side will work in some other databases, but not in the MonetDB, so I will just stop explaining it.

You can read more about this clause on this blog:
https://blog.jooq.org/a-rarely-seen-but-useful-sql-feature-corresponding/ SELECT Number, Letter FROM aboveTable UNION ALL CORRESPONDING SELECT Letter, Number FROM belowTable;

013. MonetDB – SELECT from multiple tables

MonetDB / By Bizkapish / March 29, 2024 April 1, 2024

Sample tables

We will create these two tables in the MonetDB database:

We can create these two tables using these statements. The first statement will create the table. The second statement will populate that table with data. These are the same tables we have used in the previous blog post, so maybe you already have them in your database.

1) CREATE TABLE Employees ( EmployeeID INT, Employee VARCHAR(10 ) );
2) INSERT INTO Employees ( EmployeeID, Employee ) VALUES ( 1, 'Hugo' ), ( 2, 'Gabriel' ), ( 3, 'Aurora' ), ( 4, 'Adam' );

1) CREATE TABLE Sales ( SalesID INT, DateCol DATE, EmployeeID INT, Product CHAR(1), Qty INTEGER, Price INTEGER );
2) INSERT INTO Sales ( SalesID, DateCol, EmployeeID, Product, Qty, Price ) VALUES
     ( 1, '2024-03-05', 2, 'X', 10, 4)
   , ( 2, '2024-03-05', 3, 'X', 20, 5)
   , ( 3, '2024-03-05', 2, 'Y', 30, 6)
   , ( 4, '2024-03-06', 3, 'Y', 40, 7)
   , ( 5, '2024-03-06', 2, 'Z', 50, 8)
   , ( 6, '2024-03-06', 3, 'Z', 60, 9)

Crossjoin

Diagram

Diagram shows two tables that we want to join. There are three basic ways how we can join tables, INNER join, LEFT join, and FULL OUTER join.

INNER JOIN means only matching rows will remain after join. In LEFT JOIN, we will not exclude any row from the left table and such rows will be matched with null values in the right table. A FULL OUTER JOIN means that no one will be left out. All orphan rows will be padded using nulls.

INNER JOIN

Let's filter the CROSSJOIN table so that only the rows where the Emploiees.EmploieeID column is equal to the Sales.EmploieeID column will remain.

LEFT JOIN

FULL OUTER JOIN

I will add another Sale to the Sales table. We'll pretend that the former employee made that sale.
INSERT INTO SALES ( SalesID, DateCol, EmployeeID, Product, Qty, Price ) VALUES ( 0, '2024-03-04', 0, 'Q', 15, 3 );

Matching condition and aliases

The matching conditions can be more complex than in our examples. It can consist of several columns.

SELECT Tab1
INNER JOIN Tab2
ON Tab1.Col1 = Tab2.Col1
AND (Tab1.Col2 = 3
OR Tab1.Col2 > Tab2.Col3)

We can notice that an overly complex SELECT statement can lead to hard-to-read code. This is a case where we can use aliases wisely. In practice, we usually use short aliases, consisting of only one letter, such as "a" and "b".

SELECT Tab1 a INNER JOIN Tab2 b ON a.Col1 = b.Col1 AND (a.Col2 = 3 OR a.Col2 > b.Col3) The left table would be given the alias a and the right table would be given the alias b.
This makes things much more readable.

Join with more than two tables

We will add one more table to our database in order to explain JOIN with more than two tables.

The tables are connected in a chain manner. We can join as many tables as we want.

Special syntaxes

These special syntaxes are not often used. These syntaxes are based on the fact that tables are usually joined by columns with the same name and data type.

NATURAL JOIN

A NATURAL JOIN is an INNER JOIN between two tables. The condition is based on equality between columns that have the same name and the same data type. Let's say we have an INNER JOIN between the Emploees and Sales tables.

USING Clause

The USING clause also assumes that we have some columns in both tables that have the same name and are of the same type. With the USING clause, we can specify any type of join (INNER, LEFT, LEFT OUTER). Let's make a left join between the tables Employees and Sales.

outerQuery.Letter	outerQuery.Number		siblingNumber
A	1	SELECT Number FROM innerQuery WHERE Number = 1	=> Returns null
B	2	SELECT Number FROM innerQuery WHERE Number = 2	=> Returns 2