MonetDB

0530 Self Signed Encryption for MonetDB

MonetDB can not encrypt its communication over the wire. This is leaved to specialized programs that are called "TLS Termination Proxy". In that way we get superb protection and flexibility provided by these tools.

Clients must also have support for TLS. "Mclient" and "Pymonetdb" have support for TLS.

TLS (Transport Layer Security ) is a critical internet protocol that encrypts and authenticates data in transit. TLS is widely tested and trusted, and it is used in HTTPS, email, and VPNs to prevent eavesdropping and tampering.

Most popular client tools (DBeaver, JDBC, Python libraries, etc.) already support TLS.

Purple Server

I will create totally new server for this blog post. I will call it "Purple Server ". This is the server on the new virtual machine.

I am currently the user "fff". I will add myself to "monetdb" group. This group was created after the installation of the MonetDB. People from this group can start a database.

sudo adduser fff monetdb

For this to apply, we have to log out, and then to log in.
Then, I will start monetdbd servis, and I will create a database "tlsDB".

sudo systemctl start monetdbd
monetdb create tlsDB
monetdb release tlsDB


This database will be created in the default folder "/var/monetdb5/dbfarm/".

This command bellow will make my server the full fledged systemctl controled daemon. That means that MonetDB will automatically start when I start my computer.
systemctl enable monetdbd

Self-Signed Certificate

In this tutorial we will create certificate ourselves, and we will use that certificate both on the server and on the client.

We will change default port of the MonetDB to 49999. This is because I want to use the port 50000 for TLS connection.monetdbd set port=49999 /var/monetdb5/dbfarm

Internal communication will use 127.0.0.1:49999, and external encrypted communication will use 192.168.100.152:50000. The only way to access MonetDB is through the TLS proxy server.

Creation of a Self-Signed Certificate

At the location "/etc/ssl/private" we have a folder that is only accessible to root. We can use any other folder, but it is important that folder has restrictive permissions, and that only admin can access it.
We can not access this folder with "sudo cd" because sudo doesn't work with built-in command "cd". We will first take the role of the root with "sudo -i", and then we will "cd /etc/ssl/private". That will make this folder our current folder.

Inside of this folder I will make a script.
touch /etc/ssl/private/SelfCertScript.sh

I will open this script in nano. I will paste the code.

nano SelfCertScript.sh

This is the version where we use IP address for the server.
I don't have a domain name, so I will use this version. =>
#!/bin/bash
set -euo pipefail

IP="192.168.100.152"
DAYS=90
KEY="selfsigned.key"
CRT="selfsigned.crt"

openssl req -x509 -newkey rsa:2048 -sha256 -nodes \
-keyout "$KEY" -out "$CRT" -days "$DAYS" \
-subj "/CN=$IP" \
-addext "subjectAltName=IP:$IP"

This is the version you use, if you have a domain name.  
#!/bin/bash
set -euo pipefail  

DOMAIN="dbhost.mymonetdb.org"
DAYS=90
KEY="selfsigned.key"
CRT="selfsigned.crt"  

openssl req -x509 -newkey rsa:2048 -sha256 -nodes \
-keyout "$KEY" -out "$CRT" -days "$DAYS" \
-subj "/CN=$DOMAIN" \
-addext " subjectAltName=DNS:$DOMAIN"
This is the version with the both.  
#!/bin/bash
set -euo pipefail  

IP="192.168.100.152"
DOMAIN="dbhost.mymonetdb.org"
DAYS=90 KEY="selfsigned.key"
CRT="selfsigned.crt"  

openssl req -x509 -newkey rsa:2048 -sha256 -nodes \
-keyout "$KEY" -out "$CRT" -days "$DAYS" \
-subj "/CN=$DOMAIN" \
-addext "subjectAltName=DNS:$DOMAIN,IP:$IP"

https://dbhost.mymonetdb.org
https://192.168.100.152
If we use IP in the script, then the clients must use IP. If we use domain name, then the clients must use domain name.
If we use both in the script, then the client tool can connect both with the IP or with the domain name.

We will now run our script.
bash ./SelfCertScript.sh
Two files will be created inside of the private folder. "selfsigned.key" is our secret that must be secure. This is why is important to use folder accessible only to root. "selfsigned.crt" is file that we will send to clients so that they can access our database.

We can also change permissions on files.
sudo chmod 600 /etc/ssl/private/selfsigned.*

Only root now have read, write rights on our files.

Using Some Other Folder

We can also keep our script, key and certificate in some other folder.
First, we will create a folder, and then we can make that folder only accessible to a root.
"chmod 700" means read/write/execute rights.
sudo mkdir /etc/ssl/private2
sudo chown root:root /etc/ssl/private2
sudo chmod 700 /etc/ssl/private2

Script Explanation

!/bin/bashShebang: run this script with the Bash shell (not sh, zsh, etc.).
-e (errexit)Exit immediately if any simple command returns a non-zero status (if there is an error).
-u (nounset)Treat the use of unset or empty variables as an error, and exit.
-o pipefailin a pipeline a | b | c, the pipeline's exit status is the first non-zero exit code among a, b, or c (instead of just c).

Let's dissect this line.openssl req -x509 -newkey rsa:2048 -sha256 -nodes -keyout "$KEY" -out "$CRT" -days "$DAYS" -subj "/CN=$DOMAIN" -addext "subjectAltName=DNS:$DOMAIN,IP:$IP"

openssl reqWe start self-signed certificate creation process.
-x509Create a self-signed certificate instead of a certificate signing request (CSR). CSR is for commercial certificates.
-newkey rsa:2048Generate a new RSA private key of 2048 bits.
-sha256Use SHA-256 as the hash algorithm.
-nodesPrivate key will not be encrypted. This is useful for automation, because there is no need for a password.
-keyout "$KEY_FILE"Private key will be saved to this file.
-out "$CRT_FILE"Certificate will be saved to this file.
-days "$EXPIRATION_DAYS"Certificate will be valid for 90 days. Clients usually do not trust certificates with a longer lifespan.
-subj "/CN=$IP"IP address (or domain name).
-addext "subjectAltName=IP:$IP"The same as above, but modern, because it can accept several IP addresses or domains.

Stunnel

"Stunnel" is the name of a program that we will use as a "TLS Termination Proxy". We now have to install it and configure it.

apt install stunnel4We install it. We are already the root, so we don't need "sudo".

stunnel -version

which stunnelWe find its installation folder.

cd /etc/stunnel
touch monetdb.conf
We'll go to stunnel installation folder. There we will create configuration file.

We'll open this file in nano, and we will paste the code.  

nano monetdb.conf
foreground = yes

cert = /etc/ssl/private/selfsigned.crt
key  = /etc/ssl/private/selfsigned.key

[monetdb]
accept  = 0.0.0.0:50000
connect = 127.0.0.1:49999

Stunnel Systemd Service

MonetDB service will always run automatically, because it is controlled by systemd. We want the same for stunnel. For stunnel to become a service, first we have to create configuration file for that service (systemd unit file).

First, we go to folder where we have to place systemd unit file.
cd /etc/systemd/system  

We create new file:
touch stunnel-monetdb.service  

We open this file in nano text editor:
nano stunnel-monetdb.service  

We paste our code into it:
[Unit]
Description=Stunnel TLS for MonetDB
After=network.target

[Service]
ExecStart=/usr/bin/stunnel /etc/stunnel/monetdb.conf
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
PrivateTmp=yes

[Install]
WantedBy=multi-user.target

Let's explain parts of this systemd unit file:

After=network.targetThis is a prerequisite. Stunnel should only be started after the network becomes functional during system startup.
ExecStart=/usr/bin/stunnel /etc/stunnel/monetdb.confStart stunnel program, with specified configuration file.
ExecReload=/bin/kill -HUP $MAINPID$MAINPID is process ID of the stunnel. "-HUP" means that stunnel should reload without restarting process.
Restart=on-failureRestart stunnel if it crushes.
PrivateTmp=yesInstead of /tmp and /var/tmp, service will use its own private temporary folders.
WantedBy=multi-user.targetDuring the startup, stunnel should start after some basic services already started.

Now we have to use this "stunnel-monetdb.service" file to register our new service with systemd:

sudo systemctl daemon-reexecUsed to refresh systemd.
sudo systemctl daemon-reloadReload all of the "systemd unit files". Now, it will include our "stunnel-monetdb.service" file.
sudo systemctl enable stunnel-monetdb.serviceTo have the service start automatically during reboot.
sudo systemctl start stunnel-monetdb.serviceStart our service immediately, without waiting for the first reboot. 

We can now check who is listening on port 50.000. So, port 50.000 is listened by stunnel.

sudo ss -tnlp | grep 50000

FYI: MonetDB Systemd Unit File

We can find location on MonetDB systemd unit file with this command:
systemctl status monetdbd
We can read its content like this:

systemctl cat monetdbd
[Unit]
Description=MonetDB database server daemon
Documentation=man:monetdbd https://www.monetdb.org/documentation/admin-guide/manpages/monetdbd/
After=network.target
[Service]
Type=forking
User=monetdb
Group=monetdb
ExecStart=/usr/bin/monetdbd-11.53.3 start /var/monetdb5/dbfarm
ExecStop=/usr/bin/monetdbd-11.53.3 stop /var/monetdb5/dbfarm
Restart=on-failure
PIDFile=/run/monetdb/merovingian.pid
PrivateDevices=no
ProtectSystem=full
ProtectHome=read-only
[Install]
WantedBy=multi-user.target

TLS Connection from the Local Computer

I will exit being root with "exit". I am running this on the purple server. URL is of the purple server.
mclient -d monetdbs://192.168.100.152:50000/tlsDB
It won't work because mclient doesn't have access to the certificate.

We will copy content of our certificate to $HOME folder. We need folder where mclient has access.sudo cat /etc/ssl/private/selfsigned.crt >$HOME/selfsigned.crt

We will use that folder to signal mclient where to find certificate.
mclient -d "monetdbs://192.168.100.152:50000/tlsDB?cert=$HOME/selfsigned.crt"

Now, we have access.

If you continue to get message "certificate verify failed", check your SelfCertScript.sh. Make sure that there are no invisible characters in it. After that recreate your certificate and copy it again for mclient. Try to restart your computer a few times. I had problems each time, and restarting of the computer helped.

TLS Connection from the Remote Computer

On the other virtual machine, I will place "selfsigned.crt" into home directory.

The same command will work here, too.

mclient -d "monetdbs://192.168.100.152:50000/tlsDB?cert=$HOME/selfsigned.crt"

How to use TLS with Pymonetdb

On this blog post "link", I have explained how to install pymonetdb. In the mean time there were some changes in the newest versions of Ubuntu.

If we follow that old blog post, we'll get an error.
sudo apt install python3-pip     #install pip
pip install pymonetdb                   #install pymonetdb

While we can install pip, we will not be able to install pymonetdb in this way. Ubuntu is trying to prevent us to install packages into global python context. Ubuntu wants us to use virtual environment. It doesn't want us to mess with global python environment.

We can still install into global environment in this way. This will only work for packages that are inside of the ubuntu repository.
sudo apt install python3-pymonetdb

I will install Spyder IDE:
sudo apt install spyder
I will open this GUI program
and I will run this code=>
import pymonetdb
connection = pymonetdb.connect("monetdbs://192.168.100.152:50000/tlsDB?cert=/home/fff/selfsigned.crt")
cursor = connection.cursor()
cursor.execute('SELECT 2')
print(cursor.fetchone())

We can also test the code in the command line.
python3 -c 'import pymonetdb;connection = pymonetdb.connect("monetdbs://192.168.100.152:50000/tlsDB?cert=/home/fff/selfsigned.crt" );cursor = connection.cursor();cursor.execute("SELECT 2");print(cursor.fetchone())'

We can also use syntax like this one:
connection = pymonetdb.connect("monetdbs://192.168.100.152:50000/tlsDB", cert="/home/fff/selfsigned.crt")

How to use TLS with ODBC

We will again use our Green server. We created that server in this blog post "link", and we installed ODBC drivers in it, in this other post "link".

In the mentioned post "0500 Proto_loaders, ODBC and COPY in MonetDB", we created the file "/etc/odbc.ini" with the credentials of the Blue server. We will now modify that file, so that it leads toward the Purple server, because we want to test TLS connection.

We'll change credentials, but we will add two more properties. We'll add TLS and CERT property.
FOR THE BLUE SERVER:
[DatabaseB]
Description = Connect to Blue Server
Driver = MonetDB
Server = 192.168.100.146
Port = 50000
Database = DatabaseB
User = monetdb
Password = monetdb
FOR THE PURPLE SERVER:
[tlsDB]
Description = Connect to Blue Server
Driver = MonetDB Server = 192.168.100.152
Port = 50000
Database = tlsDB
User = monetdb
Password = monetdb
TLS = ON
CERT = /home/sima/selfsigned.crt

Now it is easy to connect to stunnel through ODBC driver manager.

isql -v tlsdB

How to use TLS with JDBC

In this blog post "link", we downloaded JDBC driver/client to the green server. We will use that client again, but this time with the connection string for TLS.

java -jar /home/sima/Desktop/jdbcclient.jre8.jar -u monetdb -d "jdbc:monetdbs://192.168.100.152:50000/tlsDB?cert=/home/sima/selfsigned.crt"

We will also connect through DBeaver and JDBC. Before my explanation, it would be wise to read how to use DBeaver and JDBC without TLS. It is explained on this blog "link". I will not repeat here the whole story, just the difference. The difference is in the connection, we have to provide certificate.

We can create new connection (1), or we can edit the existing one (2).
We will fill the dialog with credentials for the Purple server (3). After that, we will go to the "Driver properties" tab (4). There we have to add our certificate. Now we can connect.

0520 CHECK, RETURNING and Other in MonetDB

Sample Table

CREATE TABLE tabProducts (
    Color VARCHAR(10),
    Size  VARCHAR(5),
    Qty   INT );
INSERT INTO tabProducts (Color, Size, Qty) VALUES
   ('Red',  'M',  10),
   ('Red',  'XXL', 10),
   ('Blue', 'XL', 30);

RETURNING Clause

Only SELECT statement returns some values. INSERT, UPDATE, DELETE just silently do their job, without any feedback.

If we execute statement:
UPDATE t1 SET Col1 = 'zzz' WHERE Id = 99;
We can afterward check rows that are updated:
SELECT * FROM t1 WHERE Id = 99;
If we could do both things with one statement that would simplify things and reduce the strain on a database.

INSERT INTO tabProducts (Color, Size, Qty)
VALUES ( 'Blue', 'S', 40 ),
       ( 'Blue', 'S', 50 )
RETURNING Color, Size, Qty;
This INSERT statement will return values.  RETURNING clause will show us all the values that we have entered.

We can do the same thing with DELETE and UPDATE. Delete will return deleted rows, and UPDATE will return updated rows.

DELETE FROM tabProducts
WHERE Qty = 50
RETURNING Size, Qty;
UPDATE tabProducts
SET Size = 'M' WHERE Qty = 40
RETURNING Size, Color, Qty;

We can use expressions in the RETURNING clause.

UPDATE tabProducts
SET Qty = 60 WHERE Qty = 40
RETURNING COUNT( * ), SUM( Qty );
DELETE FROM tabProducts
WHERE Qty = 60
RETURNING Qty + 15;

INSERT INTO tabProducts VALUES( 'Blue', 'M', 100 )
RETURNING Color, SUM( Qty )
GROUP BY Color;
This will not work. We can not use GROUP BY in the RETURNING clause. RETURNING clause must remain simple.

Referencing Columns by Their Position

SELECT Color, Size, qty
FROM tabProducts
ORDER BY 1, 2; -
-BY Color, qty
SELECT Color, SUM( qty )
FROM tabProducts
GROUP BY 1;  
BY Color

In MonetDB it is possible to
reference columns by their
position in the select clause.

Referencing Set of Columns with the Keyword ALL

SELECT Color, qty, COUNT( Size )
FROM tabProducts
GROUP BY ALL;    
–BY Color, qty
When using GROUP BY, we can only group by columns which are not aggregated. If we want to group by all of such columns, we can use "GROUP BY ALL".

SELECT Color, Size, qty
FROM tabProducts
ORDER BY ALL
–BY Color, Size, qty
We can use ALL in the ORDER BY clause. This is the same as "ORDER BY Color, Size, qty".MonetDB will accept this syntax, but there is a bug and this syntax does nothing. There will be no sorting.

Instead of the keyword ALL, we can use the star "*" sign.SELECT Color, qty, COUNT( Size )
FROM tabProducts
GROUP BY *;   
–BY Color, qty
SELECT Color, Size, qty
FROM tabProducts
ORDER BY *;  
–BY Color, Size, qty

IS [NOT] DISTINCT FROM

Anything compared with NULL will return NULL.SELECT 'null' = null;
SELECT null = null;
IS [NOT] DISTINCT FROM is a null-safe comparison operator.
This operator will always return TRUE or FALSE,
even if one of operands is NULL.

SELECT NULL IS DISTINCT FROM NULL;SELECT 'A' IS DISTINCT FROM NULL;
SELECT NULL IS NOT DISTINCT FROM NULL;  SELECT 'A' IS NOT DISTINCT FROM NULL;

CHECK Constraint

A check constraint is a type of rule which specifies a condition ( boolean expression )  that must be met by each row in a database table. This rule limits acceptable values for data.

In our table I will add check constraint that quantity column must have values less then 1. This will fail because the existing data is already bigger than 1.
ALTER TABLE tabProducts ADD CONSTRAINT "QtyLess1" CHECK (qty < 1);

If we change our condition, so that qty must be less than 100, then it will succeed.
ALTER TABLE tabProducts ADD CONSTRAINT "QtyLess100" CHECK (qty < 100);
After that, we will try to insert number 111 into qty column =>
INSERT INTO
tabProducts( Color, Size, Qty )
VALUES ( 'Blue', 'XL', 111 );

This will fail because of the constraint (111>100).

UPDATE tabProducts SET qty = 111 WHERE qty = 30;
This also mean that we can not update the value in the qty column to a value that is bigger than 100.

We can only insert values that are smaller then 100.
UPDATE tabProducts SET qty = 99 WHERE qty = 30;
SELECT * FROM tabProducts;

How to Add Check Constraint?

We can add several constraints on the same column. We now have two constraints, that "qty > 0" and "qty < 100".
ALTER TABLE tabProducts ADD CONSTRAINT "QtyGrt0" CHECK (qty > 0);
That is not efficient. It is much better to add both constraints with one statement. We can connect conditions with AND, OR.
ALTER TABLE tabProducts ADD CONSTRAINT QtyConstraints CHECK (qty > 0 AND qty < 100);

Constraints can combine several columns in the requirement expression:
ALTER TABLE tabProducts ADD CONSTRAINT CheckColorSize CHECK ( Color = 'Red' OR Size = 'XL' );

We can add constraint during the creation of a table.
CREATE TABLE tabOrders ( Price INT, Qty INT, CONSTRAINT ValidOrders CHECK ( Price > 10 AND Qty < 10 )  );

During table creation, we can add a constraint that is at the single column level. The server will provide a default name for such a constraint.CREATE TABLE tabOneColumn ( Col1 CHAR, Col2 INT CHECK ( Col2 < 999 )  );
Default name consists of the table name, the column name, and the keyword "check". In this example, that would be "tabonecolumn_col2_check".

Instead of using the default name for a single-column CHECK constraint, it is much better to give the constraint a specific name.CREATE TABLE tabOneColumnNamed
( Col1 INT CONSTRAINT Col1Constraint CHECK ( Col1 < 999 )  );

Limitations on CHECK Constraints

CHECK constraint can only refer to one row of a table. We can not use aggregation functions, because that would break such limitation.
ALTER TABLE tabProducts ADD CONSTRAINT OverFlow CHECK ( SUM( Qty ) < 1000 );
This also means that a CHECK constraint cannot refer to tables other than the table on which it is defined.

A simple check on a single column has minimal impact on performance. We should avoid complex check conditions.

CHECK Constraints and Nulls

INSERT INTO
tabOneColumn ( Col1, Col2 )
VALUES ( null, null );
This INSERT statement will always succeed. CHECK constraints can not check null values.
SELECT * FROM tabOneColumn;

This CHECK constraint will always return null, because nothing can be compared with a null.
ALTER TABLE tabOneColumn ADD CONSTRAINT ZZZ CHECK ( Col1 <> null );
CHECK will prevent entry of a row, only if condition is returning FALSE, so this constraint is useless.
INSERT INTO tabOneColumn ( Col1, Col2 ) VALUES ( null, 33 );
SELECT * FROM tabOneColumn;

How to Delete CHECK Constraints?

We can find our CHECK constraint in the view "Information_schema.check_constraint".
SELECT * FROM information_schema.check_constraints WHERE table_name = 'tabonecolumn';

We can delete constraints like this:
– We can delete just the CHECK constraint.
– We can delete the whole table.

ALTER TABLE tabonecolumn DROP CONSTRAINT zzz; DROP TABLE tabOneColumnNamed;
Now, our constraints are gone.

SELECT * FROM information_schema.check_constraints WHERE table_name = 'tabOneColumnNamed' or constraint_name = 'zzz';

LIKE and ILIKE Operators

LIKE is a pattern matching operator. It can help us to find patterns in a text. LIKE is based on two wildcard characters. Percent sign "%" will replace any set of consecutive characters. Underscore "_" will replace exactly one character. Here are some examples:

SELECT 'zzz' LIKE '%';          –true
SELECT 'zzz' LIKE '___';   –true
SELECT 'Azz' LIKE 'A%';    –true
SELECT 'AzzA' LIKE '%zz%';    –true
SELECT 'AzzA' LIKE 'A__A';   –true
SELECT 'Azz' LIKE 'Azz_';   –false

ILIKE is case insensitive version.SELECT 'fff' LIKE 'F_F';   –false  SELECT 'fff' ILIKE 'F_F';   –true

We can negate LIKE with NOT.SELECT 'M' NOT LIKE 'M';   –false  SELECT 'M' LIKE 'M';   –true 

Default escape character is backslash "\".SELECT '%_' LIKE '\%\_';  –true
We can declare any other character to be ESCAPE sign.SELECT '%_' LIKE '#%#_' ESCAPE '#';  -true

Function regexp_replace

For more complex patterns, we can use the regexp_replace function. This function accepts the original string, a search pattern, and a string that will replace the found pattern.We will replace "m", followed by two letters, with the "z".

SELECT regexp_replace( 'maaam', 'm.{2}', 'z' );

This function is case sensitive,
except if we use the forth,
optional argument.
SELECT regexp_replace( 'maaam', 'M.{2}', 'z' );
— false
SELECT regexp_replace( 'maaam', 'M.{2}', 'z', 'i' );
–true

This function also accepts other modifiers (flags). I tested that it will accept "m,i,s,x,xx". Flag "xx" is the same as "x".

It is interesting that it will not accept flag "g". It seems that this modifier is constanly turned on.SELECT regexp_replace( 'SSS', 'S', 'P' );

SELECT regexp_replace('first\nfirst', '^first', '*', 'm') AS result;
SELECT regexp_replace('a\nb', 'a.b', 'X', 's') AS result;
SELECT regexp_replace('abc123', ' 1 2 3 ', '', 'x') AS result;
Here are examples that you can try with and without modifier (flag).

SELECT regexp_replace('Prisca Gbaguidi', '\\w+\\s\\w+', 'Mireille Gbaguidi');
–Mireille Gbaguidi
Backslashes have to be escaped.
SELECT regexp_replace('Prisca Gbaguidi', '(\\w+)(\\s)(\\w+)', '\\3 \\1');
Gbaguidi Prisca
Regex with numbered capturing groups.

TRUNCATE and SERIAL Data Type

We will examine how TRUNCATE can affect the SERIAL data type.
CREATE TABLE Students ( ID SERIAL, Name VARCHAR(10) );
INSERT INTO students(name) VALUES ('Wiremu'), ('Hinewai');
SELECT * FROM Students;

We will truncate this table, but we will continue sequence.
TRUNCATE students CONTINUE IDENTITY;
INSERT INTO students(name) VALUES ('Wiremu'), ('Hinewai');
SELECT * FROM Students;

Default behavior is to restart sequence.
TRUNCATE students RESTART IDENTITY;
INSERT INTO students(name) VALUES ('Wiremu'), ('Hinewai');
SELECT * FROM Students;

0510 JDBC, Recursive CTEs, New Functions in MonetDB     

We will continue using the green and blue server that we have created in this post => link ( or you can watch the video on the youtube  => link ). This was the post about distributed query processing.monetdbd start /home/sima/monetdb/DBfarmG
mclient -u monetdb -d DatabaseG   
–password monetdb

monetdbd start /home/sima/monetdb/DBfarmB
mclient -u monetdb -d DatabaseB   
–password monetdb

JDBC

We will now connect to the blue MonetDB server, from the green server, through JDBC connector.

Installing Java

Let's see if we have Java installed.
java -version
We don't have it, so we can install it like this:
sudo apt install default-jre

Connecting With the JDBC Client

From this link:
https://www.monetdb.org/downloads/Java/  
Download this file "jdbcclient.jre8.jar".

This is JAR file that includes java console client application, but also the driver. It is all in one.Inside of the green server, run this command from the shell:

java -jar /home/sima/Desktop/jdbcclient.jre8.jar -h 192.168.100.146 -p 50000 -u monetdb -d DatabaseB

We are now connected. We can use this java client at the same way as we use "mclient" program.

SELECT * FROM SchemaGB.factB;

Using JDBC Connection with DBeaver

From this link:
https://www.monetdb.org/downloads/Java/
Download JDBC driver "monetdb-jdbc-12.0.jre8.jar".

We don't have DBeaver on the green server,
so we have to install it. We will download
DBeaver ".deb" file with wget command.
Then we can install it.
cd /home/sima/Desktop
wget https://dbeaver.io/files/dbeaver-ce_latest_amd64.deb

sudo apt install ./dbeaver-ce_latest_amd64.deb

DBeaver is a GUI program.
Click on
Database > Driver Manager menu.
Click "New".
In the new dialog enter these values in "Settings" tab:

Driver Name:        MonetDB_JDBC
Class Name:        org.monetdb.jdbc.MonetDriver
URL Template:     jdbc:monetdb://{host}:{port}/{database}

This is how "Settings" tab should look:
In the "Libraries" tab click on "Add File". Find "monetdb-jdbc-12.0.jre8.jar" driver file. This will add our driver file into DBeaver.

Close everything
and in the main
window click
on the icon for
the New Connection.
Find our driver
with a search.
Click Next and
in the new dialog
enter these values:  
Host:                192.168.100.146
Port:                50000
Database/Schema:     DatabaseB
Username:            monetdb
Password:            monetdb

This is how dialog will look like. When you click on the "Test" button you will get an error:
Invalid JDBC URL:  jdbc:monetdb://192.168.100.146:50000/DatabaseB  
This error is because DBeaver is trying to validate
"jdbc:monetdb://192.168.100.146:50000/DatabaseB" URL, or it is trying to add some hidden parameters or escape characters.

Solution is to change dialog to manual input.DBeaver will now use the raw URL. After we click on the "Test" button, we will get confirmation dialog.

We can now expand our connection and there we will find all of the schemas and tables, and now we can run queries.

Recursive Common Table Expressions

In the blog post about common table expressions ( link ), I unintentionally lied that MonetDB doesn't support Recursive CTEs. But it does, and I will explain them now.

What is Recursion

When we need to take a shower, it is always difficult to set the right water temperature. To find it, we repeat the same action. We move the lever left and right until we find the perfect spot. With each movement of the lever, we get closer to the desired temperature.

Recursion is an iterative process of finding a solution. We repeat the same logic each time, but each time we are closer to the solution because we can base our logic on the information we have gained during previous iterations.

Let's look at this example. Our friend imagined a number between 1 and 10. We have to guess that number with the minimal number of questions. The best approach is to use binary logic, based on an elimination process where in each iteration we can remove half of the numbers.


Question:
Is it bigger than 5?  

Answer:
Yes, it is.

Question:
Is it bigger than 8?

Answer:
No, it is not.  
Question:
Is it smaller than 7?

Answer:
Yes, it is.  
It has to be 6.The only argument in our alghorithm is the range of the possible numbers.

Our logic has two steps:
1) Does the range of possible numbers has only one number.
2) If it doesn't, ask the question to eliminate half of the numbers and reduce the range of possible numbers by half.

We can pack the second step into function with a name "EliminateHalf". This function will return the range of all the possible numbers. We will call this function 3 times.1) EliminateHalf (1,2,3,4,5,6,7,8,9,10) = (6,7,8,9,10)
2) EliminateHalf (6,7,8,9,10) = (6,7)
3) EliminateHalf (6,7) = (6)

We can nest these functions:
EliminateHalf(EliminateHalf(
   EliminateHalf(
        1,2,3,4,5,6,7,8,9,10))) = 6

Problem is that we don't know how many nested
functions do we need. I will create a pseudo code
that will nest as many functions as needed to get the final result.
Result = EliminateHalf( 1,2,3,4,5,6,7,8,9,10 )  # initial state
Do Until Count( Result ) = 1                    
# are we finished   
    Result = EliminateHalf( Result )            # if not, continue 
Loop

This is RECURSION. We broke the complex problem into small steps. Each step has the same logic. Each step is using arguments that are the result of the previous step. This is just one iterative process which brings us closer to the solution with each step.

You want example from the real life. The coach of the football team analyze data from the previous game. After each analysis he change the game of his team. He continues with this practice until his team start winning.

The Structure of Recursion

Recursion structure always has four steps.
The first step "initial state" is a problem
that we want to solve. We will solve it by
improving our statistics.
1) Set initial values for our arguments.                                                     # initial state                          
2) Has our goal been achieved?                                                                 # are we there yet            # recursive part
3) Improve our arguments by using some strategy.                             # continue with effort     # recursive part
4) Repeat steps 2 and 3 until we reach the goal.                                   # be persistent

Linear and Tree Recursion

In linear recursion we only have two possible outcomes. We are either satisfied with the result or we will continue with our effort. For example, the coach can be satisfied with his team or he can continue introducing improvements.

Tree Recursion is when we have several possible strategies to direct our effort. For example, the coach can change the team's game, or he can look for position in some other team. If we create a diagram of his possible actions we can get something like this:

1) Red path: Be satisfied with a Real, and then transfer to Arsenal, and start improving Arsenal game.
2) Orange path: Be satisfied with a Real, and then transfer to Arsenal, and then to Barcelona, and start improving Barcelona's game.
3) Purple path: Leave Real, improve Liverpool for a while, and then transfer to Juventus to improve Juventus game.

We can present this coach career as folders and subfolders.
Each time he improves his team, he is going deeper in the subfolders ( Real Real+ ).
Each time he changes his job, he is moving to another folder at the same level ( Real+ Juventus ).

This is why this kind of recursion is called Tree Recursion, because we have a hierarchy. This hierarchy is reflection of strategic decisions we have to make to reach our goal. Each decision will move us to another path, but we always use the same logic to decide.

Structure of The Recursive Common Table Expression

WITH RECURSIVE cte_name AS (
    SELECT ...           –initial state

    UNION ALL

   
    SELECT ...           –continue with improvements
    FROM cte_name        –get the previous state
    WHERE ...            –are we there yet )
SELECT * FROM cte_name;  –return result
In the simplest form, recursive CTE has two SELECT statements connected with UNION ALL. First select statement will define initial state.

RECURSIVE CTE will return all interim results connected with UNION ALL.

Second select statement will calculate the new status. It will reference the previous status by the name of the CTE.
WHERE in second select statement will tell us when to stop.

WITH RECURSIVE numbers(n) AS
(    SELECT 1
     
    UNION ALL
    
    SELECT n + 1
    FROM numbers

    WHERE n < 4 )
SELECT * FROM numbers;
The first select statement is called ANCHOR member.
The second one is RECURSIVE member. ANCHOR member
will be executed only once. In this example, RECURSIVE member
will be executed several times, each time with the different
value of "n", until the condition becomes false.
We will apply UNION ALL on all of the interim results.

WITH RECURSIVE factorial(n, fact) AS (
    SELECT 1, 1
     
    UNION ALL
    

    SELECT n + 1, fact * (n + 1)
    FROM factorial
    WHERE n < 5 )
SELECT n, fact FROM factorial;
This is an example how to calculate factorial.


These two examples
are for the linear recursion.

Tree Recursion

This is our hierarchy.
Our hierarchy can be
presented with the
self referencing table.

We'll create such
table in MonetDB.
CREATE TABLE Folders ( Folder TEXT, Parent TEXT );

INSERT INTO Folders ( Folder, Parent ) VALUES
('Folder1', NULL), ('Folder2', NULL),
('Folder11', 'Folder1'), ('Folder12', 'Folder1'),
('Folder21', 'Folder2'), ('Folder211', 'Folder21');

We'll make our CTE:
WITH RECURSIVE Folder_paths AS (
    SELECT Folder, Folder AS Path
    FROM Folders
    WHERE Parent IS NULL

    UNION ALL

    SELECT f.Folder, fp.Path || '>' || f.Folder AS Path
    FROM Folders f  JOIN Folder_paths fp ON f.Parent = fp.Folder
)
SELECT * FROM Folder_paths ORDER BY path;
This is initial state. This table represents folders at the top level.

We'll join original Folders table with this initial state. That will give us our new state.

We'll do it again, between Folders table and new state. This is inner join.

We can now make a final result as a union between (1),(2),(3).

Tree recursion occurs then ANCHOR and RECURSIVE members are select statements that return tables with several rows. Those rows represent folders at the same level. We have (1), (2), (3) for top (1), middle (2) and bottom (3) folders. First, we get top folders (initial state), then middle folders (first recursion), and then bottom folders (second recursion). Each recursion is used to collect folders from the level bellow.

Recursive CTEs Caveats

1) Recursive and anchor member must match in columns number and data types.

2) If we use UNION ( instead UNION ALL ), recursion will stop immediately when we face a duplicate.
This statement  will immediately end, although it doesn't have WHERE statement. This is because the
first recursion will return number 1, which is a duplicate. UNION operator will remove that duplicate,
so our result will have only one row, and that row will have the original number 1.  
WITH RECURSIVE numbers AS (
    SELECT 1
    UNION
    SELECT 1 FROM numbers )
SELECT * FROM numbers;

3) Don't user OUTER JOINS in the recursive member. The query will never end. Only INNER join is acceptable.

4) MonetDB will not complain if we use aggregate or window functions in recursive member. We can also use DISTINCT and GROUP BY in the recursive member. In MonetDB, we can use CTE's name in the FROM clause, but we can also use it in subquery. Some other servers don't allow this.

New Functions

DayName and MonthName Functions

SELECT DAYNAME('2025-07-12');SaturdayThis function returns a name of a day in a week according to the current locale, set in the OS.
SELECT MONTHNAME('2025-07-12');JulyThis is similar function that is returning the name of a month.

Beside date arguments, we can also use timestamp ('1987-09-23 11:40') or timestamp TZ ('15:35:02.002345+01:00').

Generate Series Functions

SELECT * FROM generate_series(1,9,2);1,3,5,7This function will return numbers from 1 do 9, with step 2. Default step is 1.

SELECT * FROM
generate_series('2025-01-01','2025-01-10',INTERVAL '5' DAY);
2025-01-01,2025-01-06All dates from the range, but with the step of 5 days.
SELECT * FROM
generate_series('2025-01-01','2025-05-10',INTERVAL '2' MONTH);
2025-01-01, 2025-03-01We can also get the months with the step od 2 months.

We can list seconds or days between two timestamps. 
SELECT * FROM generate_series('2025-01-01 01:40:00','2025-01-01 1:40:05', INTERVAL '3' SECOND);2025-01-01 01:40:00
2025-01-01 01:40:03
SELECT * FROM generate_series('2025-01-01 01:40:00','2025-01-06 1:40:05', INTERVAL '3' DAY);2025-01-01 01:40:00
2025-01-04
01:40:00

Generate Series functions will return a column with the name "value".
SELECT * FROM generate_series( 1,7,2);
SELECT * FROM generate_series('2025-01-01','2025-01-10',INTERVAL '5' DAY);

0500 Proto_loaders, ODBC and COPY in MonetDB

We will continue using the green and blue databases that we have created in this post => link ( or you can watch the video on the youtube  => link ). This was the post about distributed query processing.monetdbd start /home/sima/monetdb/DBfarmG
mclient -u monetdb -d DatabaseG
   –password monetdb

monetdbd start /home/sima/monetdb/DBfarmB
mclient -u monetdb -d DatabaseB
   –password monetdb

Read From CSV File With a file_loader Function

We have three files and two ".gz" archives.

  • "CSV" file is using commas. It also has commas at end of the rows with data. This is the only file that doesn't have file format extension.
  • "TSV.tsv" file is using tab as delimiter, but it also has double quotes around the strings.
  • "PSV.psv" file is using pipes and has a null in the "Letter" column.
  • Files with file format extension ".gz"are just PSV file compressed.
 

Files must be placed on the server. Returned value of the "file_loader" function is virtual table.We don't specify delimiters, wrappers and data types for the files. They are deduced automatically.We can read from CSV, TSV and PSV files, and also ".gz,.lz4,.bz2 and .xz" files.

file_loader Function Syntax

SELECT * FROM file_loader( '/home/abhishek/sample.csv' );The only argument of our function is the full path toward the file.

SELECT * FROM '/home/abhishek/sample.csv';Shorter syntax is much better. We don't have to type the function name.

Experiments With the CSV File

We can not read from files that don't have file format extension.
SELECT * FROM '/home/sima/Desktop/CSV';

I will add file format extension and then I will try again.

SELECT * FROM '/home/sima/Desktop/CSV.csv';
Our CSV file is now reduced to only one column.
The whole row of data from the CSV file is imported into one column.
This is the consequence of the commas at the end of the row.

I will remove the surplus commas. Now our query works as expected.

Experiments With the TSV File

Import from the TSV file will work fine. The double quotes wrappers around strings will be considered as a part of a string.
SELECT * FROM '/home/sima/Desktop/TSV.tsv';

We can try to trim double quotes, but our column will not be recognized.
SELECT TRIM( Letter, '"' ) FROM '/home/sima/Desktop/TSV.tsv';

Column names are case sensitive, so we have to place column names inside of the double quotes. Only then our query will work.
SELECT TRIM( "Letter", '"' ) FROM '/home/sima/Desktop/TSV.tsv';

Experiments With the PSV File

In PSV file we have a null. This null will be imported, but we don't know whether it is a null or a string "null".
SELECT * FROM '/home/sima/Desktop/PSV.psv';

We will test the null value. We can see that "null" is a string.
SELECT ISNULL( "Letter" ) FROM '/home/sima/Desktop/PSV.psv';

Experiments With the GZ files

We can easily read from the compressed PSV.psv file.
SELECT * FROM '/home/sima/Desktop/PSV.psv.gz';

If we compress our file as a "tar" tape archive format then file loader will not work.
SELECT * FROM '/home/sima/Desktop/PSV.psv.tar.gz';

Conclusion

We can conclude that file_loader function is not as versatile as a COPY INTO function, which is described in this blog post => link.

Read From Remote Database With a proto_loader Function

We have saw that we can login to MonetDB server that is on another computer. We also saw how we can create remote tables. This time we will see how to ad-hoc read tables that are on some other computer/server.

Testing Local MonetDB Server

This syntax will work on the local MonetDB server, too. We can use localhost to read from the local computer. I can run this statement on the green server. We can recall that local database, schema and table names are "DatabaseG", "schemagb" and "dimg".
SELECT ProdID FROM proto_loader( 'monetdb://127.0.0.1:50000/DatabaseG/schemagb/dimg' );

This IP address and port number are default. We can use the link without them.
Notice that the database name must use upper letters, but schema and table name must be with lower letters.
SELECT * FROM proto_loader( 'monetdb:///DatabaseG/schemagb/dimg' );

We can also use the the link that has "mapi:" in front of it. In this case, we have to provide IP and port number.

SELECT * FROM proto_loader( 'mapi:monetdb://127.0.0.1:50000/DatabaseG/schemagb/dimg' );

Testing Remote MonetDB Server

More interesting thing is ability to read tables from the remote server. I will read table from the blue server ( before that please start the blue server ).

This statement will be executed on the green server and will return the whole table from the blue server.
SELECT * FROM proto_loader( 'monetdb://192.168.100.146:50000/DatabaseB/schemagb/factb' );

Syntax with "mapi:" will also work on the remote server.

SELECT * FROM proto_loader( 'mapi:monetdb://192.168.100.146:50000/DatabaseB/schemagb/dimb' );

Creation Of a Remote Table

We can test whether we can create remote table using syntax that starts with "monetdb://". On the blue server I will change current schema and then I will create one table.

monetdbd start /home/sima/monetdb/DBfarmB
mclient -u monetdb -d DatabaseB
 –password monetdb
SET SCHEMA SchemaGB;
CREATE TABLE Test( Number INT );

On the green server, I will set current schema to SchemaGB, and I will try to connect to the Test table.
SET SCHEMA SchemaGB;
CREATE REMOTE TABLE Test( Number INT ) ON 'monetdb://192.168.100.146:50000/DatabaseB';

This will successfully create a remote table.
SELECT * FROM Test;

Connect to Any ODBC Database From MonetDB

ODBC Driver Manager

An ODBC driver manager is a middleware component that acts as an intermediary between applications and ODBC drivers, translating standardized ODBC API calls into driver-specific instructions. It allows applications to connect to different databases without needing to be rewritten for each one.

On the green server, we will install ODBC Driver Manager. First check if you alredy have it installed. Just type "odbcinst" in the shell.

I don't have it, so I will install "unixodbc" driver manager. After that command "odbcinst" will show us that the program is working.
sudo apt install unixodbc

This program will add file "odbc.ini" in the "etc" folder. This file is empty.

/etc/odbc.ini

MonetDB ODBC Driver

We can connect to any ODBC capable server, but we will use this opportunity to see how to connect to MonetDB server. We will use MonetDB ODBC driver to connect to the blue server.This is ODBC driver we need. We install it on the green server.

sudo apt install libmonetdb-client-odbc

This driver will add
"odbcinst.ini" file
in the "/etc" folder.

/etc/odbcinst.ini
The driver itself will be in this folder:

/usr/lib/x86_64-linux-gnu/libMonetODBC.so

We can see a problem. Our file is in "x86_64-linux-gnu" folder. Our "odbcinst.ini" file is directed toward "/lib" folder. We must fix that.  In the shell we run this command to start text editor as sudo, and we fix our link.
sudo gedit /etc/odbcinst.ini

Testing ODBC Driver

First, we will add our credentials to "/etc/odbc.ini" file.

sudo nano /etc/odbc.ini

Don't use "gedit" graphical text editor program. Use nano. Only then it will work.
[DatabaseB]
Description = Connect to Blue Server
Driver = MonetDB

Server = 192.168.100.146
Port = 50000
Database = DatabaseB
User = monetdb
Password = monetdb

"isql" is console program that comes with "unixODBC". We can use it to test ODBC connection toward the blue server. We can connect to the blue server if we type this into green server shell:

isql -v DatabaseB

After connection, we can set our schema, and we can query our table:

SET SCHEMA SchemaGB;
SELECT * FROM factb;

proto_loader Function For ODBC

But our goal is to use "proto_loader" function to directly fetch data into MonetDB server on the green computer, from the blue server, with ODBC. For that we will install one more package.sudo apt install monetdb-odbc-loader

Using ODBC loader is still experimental. This functionality is NOT turned on by default. We will now turn it on. First, we exit "DatabaseG".quit
monetdb stop DatabaseG

monetdb get all DatabaseG

We can now read properties of
the DatabaseG. Property of
interest is the last one:

loadmodules
We will change this property like this:
monetdb set loadmodules=odbc_loader DatabaseG
We can always turn it back like this:
monetdb set loadmodules=  DatabaseG

We will now log in to our database. This will automatically start the server. During that, we will automatically load "odbc-loader" module.mclient -u monetdb -d DatabaseG    –password monetdb

Using proto_loader Function For ODBC

This is how we run ODBC query directly from the mclient. In this way we can connect to any ODBC capable server, not just other MonetDB. We just have to prepare ODBC driver for that other server.
SELECT * FROM
proto_loader('odbc:DSN=DatabaseB;QUERY=SELECT * FROM schemagb.factb');

All the magic is on the green server. "proto_loader" function will use "odbc_loader" (1) module to send standardized ODBC message to "unixODBC manager" (2). This manager will use MonetDB ODBC driver (3) to call the Blue server (4).

It is also possible to provide all of the necessary parameters directly inside of the ODBC connection string:
SELECT * FROM proto_loader('odbc:DRIVER=/usr/lib/x86_64-linux-gnu/libMonetODBC.so;SERVER=192.168.100.146;PORT=50000;DATABASE=DatabaseB;UID=monetdb;PWD=monetdb;QUERY=SELECT * FROM schemagb.factb')
There is also a version that is using DSN file. This version is for Windows only.
odbc:FILEDSN=<data source name>;[<ODBC connection parameters>;]QUERY=<SQL query>

Virtual Tables

Virtual tables are tables that don't have data physically stored in MonetDB table. Virtual tables are views, merge tables, remote tables. Tables that we receive through file_loader and proto_loader functions are also virtual tables. We will now see how to transform file_loader and proto_loader virtual tables into more permanent structures.

CREATE TABLE Based on the Loader Function

We can use CREATE TABLE AS to store CSV file into new table:
CREATE TABLE permanentCSV ( Number, Letter ) AS ( SELECT * FROM '/home/sima/Desktop/CSV.csv' );

Next, we will read from the permanentCSV table:
SELECT * FROM permanentCSV;

CREATE TEMPORARY TABLE Based on the Loader Function

CREATE LOCAL TEMPORARY TABLE temporaryFactB ( YearNum, Dates, ProdID, Qty ) AS
(  SELECT * FROM proto_loader('odbc:DSN=DatabaseB;QUERY=SELECT * FROM schemagb.factb') )
WITH DATA ON COMMIT PRESERVE ROWS;
We can also make a temporary table.

We can read from this temporary table.

SELECT * FROM temporaryFactB;

Bulk INSERT Based on the Loader Function

TRUNCATE temporaryFactB;                     We can pull data from any other ODBC capable server into our temporaryFactB table (which is now empty).
INSERT INTO temporaryFactB ( YearNum, Dates, ProdID, Qty )
SELECT * FROM proto_loader('odbc:DRIVER=/usr/lib/x86_64-linux-gnu/libMonetODBC.so;SERVER=192.168.100.146;PORT=50000;DATABASE=DatabaseB;UID=monetdb;PWD=monetdb;QUERY=SELECT * FROM schemagb.factb');

SELECT * FROM temporaryFactB; 

COPY command

We already talked about COPY INTO and COPY FROM statements ( blog1 and blog2; youtube1 and youtube2 ). We will now see some special syntaxes of these commands.

COPY FROM stdin

We will first create one empty table.CREATE TABLE tabStdin( Number INT, Letter CHAR );

sql>COPY INTO tabStdin FROM STDIN;
more>1|a
more>2|b
more>

2 affected rows
sql>
We will then run this statement that copies directly from the mclient. After this statement, the prompt will become "more>". That means that MonetDB is expecting more data. We will type the first row "1|a", then the second row "2|b", and for the third row we will enter nothing. After we press Enter,  we will exit and prompt will become "sql>".

We can now read from our table. Values that we have typed after "more>" prompt are inside of the table. Pipe was used as a delimiter between columns.
SELECT * FROM tabStdin;

sql>COPY 2 RECORDS INTO tabStdin FROM STDIN; more>3|c
more>4|d
2 affected rows
sql>
If we provide the number of
rows
that we will COPY,
then we just have to type Enter twice.

COPY INTO stdout

Statement below doesn't work correctly:
COPY SELECT * FROM tabStdin INTO STDOUT;
According to github, this is a bug:
https://github.com/MonetDB/MonetDB/issues/7028

COPY FROM Csv, With DECIMAL Clause

When importing CSV file, we can specify what
symbols are used for decimal and thousands
separators. Let's say that we have CSV file where
decimal point is star "*", and thousands separator
is underscore "_".
I will create one table that accepts DECIMAL numbers.
CREATE TABLE tabDecimal
( Number DECIMAL, Letter CHAR );

COPY OFFSET 2 INTO tabDecimal
FROM '/home/sima/Desktop/CSV_file'( Number, Letter ) DECIMAL AS '*','_';
With DECIMAL clause we can specify what decimal point and thousands separator, our CSV has.

If we now read from our table, there we will find numbers correctly recognized.In MonetDB, default decimal point is a dot. There is no default sign for thousands separator.

0490 Grouping Sets and Comments in MonetDB

Sample Table

CREATE TABLE tabSales( Continent VARCHAR(20), Subcontinent  VARCHAR(20), Country VARCHAR(20),
                       State     VARCHAR(30), Sales   INT          );

INSERT INTO tabSales VALUES ('America', 'North',   'Canada', 'Ontario',     1),
                            ('America', 'North',   'Canada', 'Quebec',      2),
                            ('America', 'North',   'US',     'California',  4),
                            ('America', 'North',   'US',     'Texas',       8),
                            ('America', 'Central', 'Mexico', 'Jalisco',     16);

SELECT * FROM tabSales;

The Problem

When we create a Pivot table, from the sample table, we will see all of the detail sales (1,2,4,8,16), but we will also see totals (3,12,16,31).

The question is, what query would return all of these numbers, both detail values and totals, if we use MonetDB.

This is one possible solution:

SELECT Country, State, Sales FROM tabSales
UNION ALL
SELECT Country, null, SUM( Sales ) FROM tabSales GROUP BY Country
UNION ALL
SELECT null, null, SUM( Sales ) FROM tabSales;
On the image, the rows are sorted so
that the table looks like the pivot table.

UNION ALL solution is bad for several reasons:
1) We have three queries to execute and then to combine multiple result sets into one.
2) It is hard to read and modify long UNION ALL query.
3) We have to be careful to properly align columns.

This is the problem that can be solved by grouping sets.

Grouping Sets

"Grouping Sets" are much better and faster syntax to achieve the same goal.  

SELECT Country, State, SUM( Sales )
FROM tabSales
GROUP BY GROUPING SETS ( Country, State, ( ) );  


Empty parentheses are for the grand total.
We'll get the same result, except the
repetition of country names is reduced. Instead of them we have nulls.

Look what we will get if we place parentheses around Country and State.  

SELECT Country, State, SUM( Sales )
FROM tabSales
GROUP BY GROUPING SETS ( ( Country, State ), () );


Parentheses are there to define each group.

We can see the effect of parentheses better on this example.  

SELECT Continent, Subcontinent, SUM( Sales )
FROM tabSales
GROUP BY GROUPING SETS ( ( Continent, Subcontinent ), Continent );

Continent can be used by itself, but it can be also used in conjunction with Subcontinent to define a group. 

It is now clear that each element inside GROUPING SETS is a separate definition of a group. Each group can be defined by one column  > Continent <, or by several columns placed inside of the parentheses ( Continent, Subcontinent ).

These two examples, that would return the same result, show the logic and brevity of the grouping sets.SELECT Col1, Col2, SUM( Sales )
FROM Table
GROUP BY GROUPING SETS
    (  ( Col1, Col2 ), Col1 );
SELECT Col1, Col2, Sales FROM Table
UNION ALL

SELECT Col1, null, SUM( Sales )
   FROM Table
   GROUP BY Col1;

Rollup

SELECT Continent, Subcontinent, Country, SUM( Sales )
FROM tabSales

GROUP BY ROLLUP( Continent, Subcontinent, Country );
ROLLUP( a, b, c ) is the same as grouping sets "( a, b, c ), ( a, b ), ( a ), ()". This is a way to get hierarchy of the columns. Rollup will give us all of the numbers that we need to create a pivot table.
For ROLLUP, the order of the columns is important.

ROLLUP( a, b, c )                ROLLUP( c, b, a )

( a, b, c )                      ( c, b, a )
( a, b )                         ( c, b )
( a )                            ( c )
( )                              ( )

Similar to GROUPING SETS, ROLLUP can also create combinations of columns by using parentheses.
 
SELECT Subcontinent, Country, State, SUM( Sales )
FROM tabSales
GROUP BY ROLLUP( ( Subcontinent, Country ), State );

CUBE

CUBE works similar to ROLLUP, but have a different logic. CUBE will give us all of the possible combinations. CUBE( a, b, c ) will give us 2^3 grouping sets "(a,b,c), (a,b), (a,c), (b,c), (a), (b), (c) and ()".
———————————————————————————-

SELECT Subcontinent, Country, SUM( Sales )
FROM tabSales GROUP BY CUBE( Subcontinent, Country );
Because we have only 2 columns inside of CUBE in our example, number of combinations is 2^2 = 4 "(a,b), (b), (a), ()".

We can also define groups by using parentheses.  
SELECT Subcontinent, Country, SUM( Sales ) FROM tabSales
GROUP BY CUBE( ( Subcontinent, Country ) );

We now have only one element. We have only  two ( 2^1 ) groups "(a), ()".

Addition and Multiplication in Grouping Sets

This is addition:

( a, b )       +       ( c )       =       ( a, b )
                                           ( c )
This is multiplication. Multiplication is crossjoin between individual values.
                                      a1bc1
( a )            ( b, c )             a1bc2
  a1      *        bc1       =        a1bc3
  a2               bc2                a2bc1
                   bc3                a2bc2
                                      a2bc3

Syntax for addition is like this. Everything inside of the GROUPING SETS parentheses will be added to each other. In this example we will add ( Subcontinent ) + ( Country ) + ( ).  

SELECT Subcontinent, Country, SUM( Sales )
FROM tabSales
GROUP BY GROUPING SETS ( Subcontinent, ROLLUP( Country ) );

So, if we create GROUPING SETS like this, this will be addition.
GROUPING SETS ( Continent, ROLLUP( Continent, Subcontinent ), CUBE( Country, State ), () )

Addition can easily create duplicates:
SELECT Continent, SUM( Sales ) FROM tabSales
GROUP BY GROUPING SETS ( CUBE( Continent ), () );
CUBE will create Grand Total, but we will also
get grand total from the "( )" element.

This is a syntax for multiplication. This time we will have commas between GROUPING SETS, ROLLUPS and CUBES, and individual elements.
GROUPING SETS ( Continent ), ROLLUP( Continent, Subcontinent ), CUBE( Country, State ), (),Country

This example will give us 2 x 2 = 4 rows. ROLLUP will give us America, and "( )". GROUPING SETS will give us "North" and "Central". Then we combine them 2 x 2.  

SELECT Continent, Subcontinent, SUM( Sales ) FROM tabSales
GROUP BY ROLLUP( Continent ), GROUPING SETS ( Subcontinent );
Multiplication can also easily create duplicates:
SELECT Continent, SUM( Sales ) FROM tabSales
GROUP BY GROUPING SETS ( Continent, () ), GROUPING SETS ( Continent, () );

Indicator Function – GROUPING

GROUPING function will inform us what rows are subtotals / grand total. In such rows, some columns have nulls because they are consolidated. GROUPING function has an argument which is a column, and GROUPING function will return the result only for that column.

SELECT Continent, GROUPING( Continent ) AS ContiGroup, Subcontinent, GROUPING( Subcontinent ) AS SubcontiGroup , SUM( Sales )
FROM tabSales
GROUP BY ROLLUP ( Continent, Subcontinent );

This function is important because it help us to make distinction between subtotal nulls, and missing data nulls.

Formatting with COALESCE and Sort

This is not good looking table. Let's fix it.

SELECT Subcontinent, State, SUM( Sales )
FROM tabSales
GROUP BY ROLLUP( Subcontinent, State );

COALESCE will helps us to eliminate NULLS:

SELECT COALESCE( Subcontinent, 'Grand' ) AS Subcontinent
     , COALESCE( State, 'Total' ) AS State
     , SUM( Sales ) AS Sales FROM tabSales
GROUP BY ROLLUP( Subcontinent, State );

With GROUPING function, we can create columns that will help us to sort the table.

SELECT COALESCE( Subcontinent, 'Grand' ) AS Subcontinent
     , COALESCE( State, 'Total' ) AS State
     , SUM( Sales ) AS Sales
     , GROUPING( Subcontinent ) AS SubcSort
     , GROUPING( State ) AS StateSort 
FROM tabSales GROUP BY ROLLUP( Subcontinent, State )
ORDER BY SubcSort, Subcontinent, StateSort;

These auxiliary columns ( SubcSort and StateSort ) can be easily eliminated by wrapping everything with "SELECT Subcontinent, State, Sales".

Comments

Sample Table and Function

Let's create two tables and function.

CREATE TABLE tabComment( Number INTEGER );
CREATE TEMPORARY TABLE tabTemporary( Number INTEGER );

CREATE OR REPLACE FUNCTION funcComment( Arg1 INTEGER )
RETURNS INTEGER
BEGIN
     RETURN 2;
END;

Comments on Database Objects

We can create comments that are tied for database objects. Comments convey information about that object.  COMMENT ON TABLE tabComment IS 'tabComment description';
COMMENT ON COLUMN tabComment.Number IS 'Number column description';
COMMENT ON FUNCTION funcComment IS 'funcComment description';
COMMENT ON SCHEMA sys IS 'sys schema description';

We will then find IDs of our database objects:
SELECT * FROM sys.tables WHERE name = 'tabcomment';
15876
SELECT * FROM sys.columns WHERE table_id = 15876;
15875
SELECT * FROM sys.functions WHERE name = 'funccomment';
15881
SELECT * FROM sys.schemas WHERE name = 'sys';

2000
All of these IDs can be found in the system table "sys.comments" together with their comments.
SELECT * FROM sys.comments
WHERE Id IN ( 15876, 15875, 15881, 2000 );


Deleting a Comment

If we delete an object, its comment will be deleted.
DROP TABLE tabComment;
SELECT * FROM sys.comments WHERE Id = 15876;

We can delete a comment by setting it to NULL or an empty string.
COMMENT ON SCHEMA sys IS null;
SELECT * FROM sys.comments WHERE Id = 2000;

If a function is overloaded then we have to provide the full signature.
COMMENT ON FUNCTION funcComment( INTEGER ) IS '';
SELECT * FROM sys.comments WHERE Id = 15881;

Persistent Database Objects

There are other database objects that we can place a comment on. They are all persistent database objects.COMMENT ON VIEW view_name IS 'Comment';
COMMENT ON INDEX index_name IS 'Comment';
COMMENT ON SEQUENCE sequence_name IS 'Comment';
COMMENT ON PROCEDURE procedure_name IS 'Comment';

COMMENT ON AGGREGATE aggregate_name IS 'Comment';
COMMENT ON LOADER loader_name IS 'Comment';

We can not create a comment on a temporary object.
COMMENT ON TABLE tabTemporary IS 'tabTemporary description';