bizkapish

0540 Backing up the MonetDB Database

MonetDB / By Bizkapish / August 10, 2025 March 21, 2026

Database Schema ( as a file )

A database schema is a list of all database objects and their attributes, written in a formal language. If we already have a ubiquitous formal language for creating database objects, SQL, then we should use SQL language for writing our schema.

Qualities of the Good Schemas

Database objects should be created in the proper order. For example, tables should be created before we create views that are based on those tables. Here we can see recommended order of statements:

a) CREATE SCHEMAS
b) CREATE SEQUENCES
c) CREATE TABLES
d) ALTER TABLES for FOREIGN KEYS
e) CREATE INDEXES

f) CREATE VIEWS
g) CREATE FUNCTIONS and PROCEDURES
h) CREATE TRIGGERS
i) INSERT INTO data
j) GRANT/REVOKE

When running schema file, we should wrap it into transaction to avoid incomplete creation.

Credentials

Msqldump

Msqldump is a console program used to create a schema from an existing MonetDB database. The resulting file can be used for backups or to migrate the database to another MonetDB server. With some manual tweaking of this file, we can use it to migrate database to a server other than MonetDB (perhaps Postgres or MySQL).

By default, instead of the INSERT INTO statements, our schema will return "COPY FROM stdin" statements. This will make restoring of the database faster.
We can use "-e" switch to add NO ESCAPE clause to COPY statement. `msqldump -d voc -e > /home/fffovde/Desktop/voc_db.sql`

For returning INSERT INTO statements, we should use "-N" switch.
msqldump -d voc -N > /home/fffovde/Desktop/voc_db.sql

Partial Backup of Database

Location of A Schema File

Credentials in the Command

By providing IP address and the port number, we can backup database locally, or we can pull it from the remote server, even if it is protected with TLS.

Other Msqldump Options

We will run these commands on the "voc" server.

Environment Variables

Environment variables are global variables that can be accessed by any process. They are key-value pairs that provide a way for processes to communicate with each other and the operating system.

DOTMONETDBFILE Environment Variable

If we want to make our Msqldump commands shorter, we can place values for some of the options into file. DOTMONETDBFILE is an environment variable with the path toward that file.

In the temporary folder I already have the file ".monetdb". It has username and password. This is the file we have moved before. I will add IP address and port number of the green server into this file.
nano /tmp/.monetdb
I will set my DOTMONETDBFILE environment variable to this file. This will be valid for one session.
export DOTMONETDBFILE=/tmp/.monetdb user=monetdb
password=monetdb
host=192.168.100.145
port=50000

Default Folders for .monetdb File

DOTMONETDBFILE is only used if we want to keep ".monetdb" in non-standard folder. For standard folders, we don't need DOTMONETDBFILE. We just have to place the file into one of the standard folders, and this is how we used ".monetdb" file before the talk about environment variables. 1) Current directory.
2) Path inside of the $XDG_CONFIG_HOME variable (if exists).
3) $HOME directory.

Msqldump will first look for a ".monetdb" in the current directory, then in $XDG_CONFIG_HOME, and at the end in the $HOME ( /home/fffovde ).

Ignoring Default Folders

If we set DOTMONETDBFILE to empty string, Msqldump will ignore all of the configuration files, and will always ask for credentials.

# export DOTMONETDBFILE=""

From here you can download "voc" database sql dump file (schema file).

voc_dump Download

0530 Self Signed TLS with Stunnel for MonetDB

MonetDB / By Bizkapish / August 3, 2025 March 21, 2026

Purple Server

I will create totally new server for this blog post. I will call it "Purple Server". This is the server on the new virtual machine. We will use this server in some of the later videos.

I am currently the user "fff". I will add myself to "monetdb" group. This group was created after the installation of the MonetDB. People from this group can start a database.

sudo adduser fff monetdb

For this to apply, we have to log out, and then to log in. Then, I will start monetdbd service, and I will create a database "tlsDB".

sudo systemctl start monetdbd monetdb create tlsDB monetdb release tlsDB

This database will be created in the default folder "/var/monetdb5/dbfarm/".

This command bellow will make my server the full fledged systemctl controled daemon. That means that MonetDB will automatically start when I start my computer.
systemctl enable monetdbd

Self-Signed Certificate

In this tutorial we will create certificate ourselves, and we will use that certificate both on the server and on the client.

We will change default port of the MonetDB to 49999. This is because I want to use the port 50000 for TLS connection. monetdbd set port=49999 /var/monetdb5/dbfarm

Creation of a Self-Signed Certificate

Inside of this folder I will make a script.
touch /etc/ssl/private/SelfCertScript.sh

I will open this script in nano. I will paste the code.

nano SelfCertScript.sh

This is the version where we use IP address for the server.
I don't have a domain name, so I will use this version. => #!/bin/bash set -euo pipefail IP="192.168.100.152" DAYS=90 KEY="selfsigned.key" CRT="selfsigned.crt" openssl req -x509 -newkey rsa:2048 -sha256 -nodes \ -keyout "$KEY" -out "$CRT" -days "$DAYS" \ -subj "/CN=$IP" \ -addext "subjectAltName=IP:$IP"

This is the version you use, if you have a domain name.
#!/bin/bash set -euo pipefail DOMAIN="dbhost.mymonetdb.org" DAYS=90 KEY="selfsigned.key" CRT="selfsigned.crt" openssl req -x509 -newkey rsa:2048 -sha256 -nodes \ -keyout "$KEY" -out "$CRT" -days "$DAYS" \ -subj "/CN=$DOMAIN" \ -addext " subjectAltName=DNS:$DOMAIN" This is the version with the both.
#!/bin/bash set -euo pipefail IP="192.168.100.152" DOMAIN="dbhost.mymonetdb.org" DAYS=90 KEY="selfsigned.key" CRT="selfsigned.crt" openssl req -x509 -newkey rsa:2048 -sha256 -nodes \ -keyout "$KEY" -out "$CRT" -days "$DAYS" \ -subj "/CN=$DOMAIN" \ -addext "subjectAltName=DNS:$DOMAIN,IP:$IP"

https://dbhost.mymonetdb.org
https://192.168.100.152

If we use IP in the script, then the clients must use IP. If we use domain name, then the clients must use domain name.
If we use both in the script, then the client tool can connect both with the IP or with the domain name.

We can also change permissions on files.
sudo chmod 600 /etc/ssl/private/selfsigned.* Only root now have read, write rights on our files.

Using Some Other Folder

We can also keep our script, key and certificate in some other folder.
First, we will create a folder, and then we can make that folder only accessible to a root.
"chmod 700" means read/write/executerights. sudo mkdir /etc/ssl/private2 sudo chown root:root /etc/ssl/private2 sudo chmod 700 /etc/ssl/private2

Script Explanation

`!/bin/bash`	Shebang: run this script with the Bash shell (not sh, zsh, etc.).
`-e (errexit)`	Exit immediately if any simple command returns a non-zero status (if there is an error).
`-u (nounset)`	Treat the use of unset or empty variables as an error, and exit.
`-o pipefail`	in a pipeline `a \| b \| c`, the pipeline's exit status is the first non-zero exit code among `a`, `b`, or `c` (instead of just `c`).

Let's dissect this line. openssl req -x509 -newkey rsa:2048 -sha256 -nodes -keyout "$KEY" -out "$CRT" -days "$DAYS" -subj "/CN=$DOMAIN" -addext "subjectAltName=DNS:$DOMAIN,IP:$IP"

`openssl req`	We start self-signed certificate creation process.
`-x509`	Create a self-signed certificate instead of a certificate signing request (CSR). CSR is for commercial certificates.
`-newkey rsa:2048`	Generate a new RSA private key of 2048 bits.
`-sha256`	Use SHA-256 as the hash algorithm.
`-nodes`	Private key will not be encrypted. This is useful for automation, because there is no need for a password.
`-keyout "$KEY_FILE"`	Private key will be saved to this file.
`-out "$CRT_FILE"`	Certificate will be saved to this file.
`-days "$EXPIRATION_DAYS"`	Certificate will be valid for 90 days. Clients usually do not trust certificates with a longer lifespan.
`-subj "/CN=$IP"`	IP address (or domain name).
`-addext "subjectAltName=IP:$IP"`	The same as above, but modern, because it can accept several IP addresses or domains.

Stunnel

"Stunnel" is the name of a program that we will use as a "TLS Termination Proxy". We now have to install it and configure it.

apt install stunnel4 We install it. We are already the root, so we don't need "sudo".

stunnel -version

which stunnel We find its installation folder.

cd /etc/stunnel touch monetdb.conf We'll go to stunnel installation folder. There we will create configuration file.

We'll open this file in nano, and we will paste the code.

nano monetdb.conf foreground = yes cert = /etc/ssl/private/selfsigned.crt key = /etc/ssl/private/selfsigned.key [monetdb] accept = 0.0.0.0:50000 connect = 127.0.0.1:49999

Stunnel Systemd Service

MonetDB service will always run automatically, because it is controlled by systemd. We want the same for stunnel. For stunnel to become a service, first we have to create configuration file for that service (systemd unit file).

First, we go to folder where we have to place systemd unit file.
cd /etc/systemd/system

We create new file:
touch stunnel-monetdb.service

We open this file in nano text editor:
nano stunnel-monetdb.service

We paste our code into it: [Unit] Description=Stunnel TLS for MonetDB After=network.target
[Service] ExecStart=/usr/bin/stunnel /etc/stunnel/monetdb.conf ExecReload=/bin/kill -HUP $MAINPID Restart=on-failure PrivateTmp=yes [Install] WantedBy=multi-user.target

Let's explain parts of this systemd unit file:

`After=network.target`	This is a prerequisite. Stunnel should only be started after the network becomes functional during system startup.
`ExecStart=/usr/bin/stunnel /etc/stunnel/monetdb.conf`	Start stunnel program, with specified configuration file.
`ExecReload=/bin/kill -HUP $MAINPID`	$MAINPID is process ID of the stunnel. "-HUP" means that stunnel should reload without restarting process.
`Restart=on-failure`	Restart stunnel if it crushes.
`PrivateTmp=yes`	Instead of `/tmp` and `/var/tmp`, service will use its own private temporary folders.
`WantedBy=multi-user.target`	During the startup, stunnel should start after some basic services already started.

Now we have to use this "stunnel-monetdb.service" file to register our new service with systemd:

`sudo systemctl daemon-reexec`	Used to refresh systemd.
`sudo systemctl daemon-reload`	Reload all of the "systemd unit files". Now, it will include our "stunnel-monetdb.service" file.
`sudo systemctl enable stunnel-monetdb.service`	To have the service start automatically during reboot.
`sudo systemctl start stunnel-monetdb.service`	Start our service immediately, without waiting for the first reboot.

We can now check who is listening on port 50.000. So, port 50.000 is listened by stunnel.

sudo ss -tnlp | grep 50000

FYI: MonetDB Systemd Unit File

TLS Connection from the Local Computer

I will exit being root with "exit". I am running this on the purple server. URL is of the purple server.
mclient -d monetdbs://192.168.100.152:50000/tlsDB
It won't work because mclient doesn't have access to the certificate.

We will copy content of our certificate to $HOME folder. We need folder where mclient has access. sudo cat /etc/ssl/private/selfsigned.crt >$HOME/selfsigned.crt

If you continue to get message "certificate verify failed", check your SelfCertScript.sh. Make sure that there are no invisible characters in it. After that recreate your certificate and copy it again for mclient. Try to restart your computer a few times. I had problems each time, and restarting of the computer helped.

TLS Connection from the Remote Computer

On the other virtual machine, I will place "selfsigned.crt" into home directory.
ls -alh selfsigned.crt

How to use TLS with Pymonetdb

We will now see how to connect to TLS protected database from python. First, we will install pymonetdb.

While we can install pip, we will not be able to install pymonetdb in this way. Ubuntu is trying to prevent us to install packages into global python context. Ubuntu wants us to use virtual environment. It doesn't want us to mess with global python environment.

We can still install into global environment in this way. This will only work for packages that are inside of the ubuntu repository.
sudo apt install python3-pymonetdb

I will install Spyder IDE:
sudo apt install spyder
I will open this GUI program
and I will run this code=> import pymonetdb connection = pymonetdb.connect("monetdbs://192.168.100.152:50000/tlsDB?cert=/home/fff/selfsigned.crt") cursor = connection.cursor() cursor.execute('SELECT 2') print(cursor.fetchone())

We can also test the code in the command line.
python3 -c 'import pymonetdb;connection = pymonetdb.connect("monetdbs://192.168.100.152:50000/tlsDB?cert=/home/fff/selfsigned.crt" );cursor = connection.cursor();cursor.execute("SELECT 2");print(cursor.fetchone())'

We can also use syntax like this one:
connection = pymonetdb.connect("monetdbs://192.168.100.152:50000/tlsDB", cert="/home/fff/selfsigned.crt")

How to use TLS with ODBC

We will again use our Green server. We created that server in this blog post "link", and we installed ODBC drivers in it, in this other post "link".

In the mentioned post "0500 Proto_loaders, ODBC and COPY in MonetDB", we created the file "/etc/odbc.ini" with the credentials of the Blue server. We will now modify that file, so that it leads toward the Purple server, because we want to test TLS connection.

We'll change credentials, but we will add two more properties. We'll add TLS and CERT property. FOR THE BLUE SERVER:
[DatabaseB]
Description = Connect to Blue Server
Driver = MonetDB
Server = 192.168.100.146
Port = 50000
Database = DatabaseB
User = monetdb
Password = monetdb FOR THE PURPLE SERVER:
[tlsDB]
Description = Connect to Blue Server
Driver = MonetDB Server = 192.168.100.152
Port = 50000
Database = tlsDB
User = monetdb
Password = monetdb
TLS = ON
CERT = /home/sima/selfsigned.crt

How to use TLS with JDBC

In this blog post "link", we downloaded JDBC driver/client to the green server. We will use that client again, but this time with the connection string for TLS.

We will also connect through DBeaver and JDBC. Before my explanation, it would be wise to read how to use DBeaver and JDBC without TLS. It is explained on this blog "link". I will not repeat here the whole story, just the difference. The difference is in the connection, we have to provide certificate.

Including Certificate Inside of the URL

We can run this command on the purple and the green server, and we would get the same result:

This command will take our certificate, it would transform it into DER format, and then it would calculate SHA256 hash of that format in HEX digits.
b89c338234850f8def5d4612e6c868cc5f85fe22e6d6a6b5acf8a7d17a15d764

For this demonstration, we just need the first 16 digits. So we would complete command like this:
openssl x509 -in /home/sima/selfsigned.crt -outform DER | openssl dgst -sha256 -r | cut -c1-16
b89c338234850f8d

We can now use those first 16 digits inside of our URL: mclient -d "monetdbs://192.168.100.152:50000/tlsDB?certhash=sha256:b89c338234850f8d"
This way we can also connect to MonetDB.

0520 CHECK, RETURNING and Other in MonetDB

MonetDB / By Bizkapish / July 19, 2025 March 21, 2026

Sample Table

CREATE TABLE tabProducts (
Color VARCHAR(10),
Size VARCHAR(5),
Qty INT ); INSERT INTO tabProducts (Color, Size, Qty) VALUES
('Red', 'M', 10),
('Red', 'XXL', 10),
('Blue', 'XL', 30);

RETURNING Clause

Only SELECT statement returns some values. INSERT, UPDATE, DELETE just silently do their job, without any feedback.

If we execute statement:
UPDATE t1 SET Col1 = 'zzz' WHERE Id = 99; We can afterward check rows that are updated:
SELECT * FROM t1 WHERE Id = 99; If we could do both things with one statement that would simplify things and reduce the strain on a database.

We can do the same thing with DELETE and UPDATE. Delete will return deleted rows, and UPDATE will return updated rows.

We can use expressions in the RETURNING clause.

INSERT INTO tabProducts VALUES( 'Blue', 'M', 100 ) RETURNING Color, SUM( Qty ) GROUP BY Color; This will not work. We can not use GROUP BY in the RETURNING clause. RETURNING clause must remain simple.

Referencing Columns by Their Position

Referencing Set of Columns with the Keyword ALL

Instead of the keyword ALL, we can use the star "*" sign. SELECT Color, qty, COUNT( Size )
FROM tabProducts GROUP BY *; --BY Color, qty SELECT Color, Size, qty
FROM tabProducts ORDER BY *; --BY Color, Size, qty

IS [NOT] DISTINCT FROM

Anything compared with NULL will return NULL. SELECT 'null' = null; SELECT null = null; IS [NOT] DISTINCT FROM is a null-safe comparison operator.
This operator will always return TRUE or FALSE,
even if one of operands is NULL.

`SELECT NULL IS DISTINCT FROM NULL;`		`SELECT 'A' IS DISTINCT FROM NULL;`
`SELECT NULL IS NOT DISTINCT FROM NULL;`		`SELECT 'A' IS NOT DISTINCT FROM NULL;`

CHECK Constraint

A check constraint is a type of rule which specifies a condition ( boolean expression ) that must be met by each row in a database table. This rule limits acceptable values for data.

If we change our condition, so that qty must be less than 100, then it will succeed.
ALTER TABLE tabProducts ADD CONSTRAINT "QtyLess100" CHECK (qty < 100);
After that, we will try to insert number 111 into qty column => INSERT INTO tabProducts( Color, Size, Qty ) VALUES ( 'Blue', 'XL', 111 );

This will fail because of the constraint (111>100).

UPDATE tabProducts SET qty = 111 WHERE qty = 30; This also mean that we can not update the value in the qty column to a value that is bigger than 100.

How to Add Check Constraint?

We can add several constraints on the same column. We now have two constraints, that "qty > 0" and "qty < 100".
ALTER TABLE tabProducts ADD CONSTRAINT "QtyGrt0" CHECK (qty > 0);

That is not efficient. It is much better to add both constraints with one statement. We can connect conditions with AND, OR.
ALTER TABLE tabProducts ADD CONSTRAINT QtyConstraints CHECK (qty > 0 AND qty < 100);

Constraints can combine several columns in the requirement expression:
ALTER TABLE tabProducts ADD CONSTRAINT CheckColorSize CHECK ( Color = 'Red' OR Size = 'XL' );

We can add constraint during the creation of a table.
CREATE TABLE tabOrders ( Price INT, Qty INT, CONSTRAINT ValidOrders CHECK ( Price > 10 AND Qty < 10 ) );

During table creation, we can add a constraint that is at the single column level. The server will provide a default name for such a constraint. CREATE TABLE tabOneColumn ( Col1 CHAR, Col2 INT CHECK ( Col2 < 999 ) );
Default name consists of the table name, the column name, and the keyword "check". In this example, that would be "tabonecolumn_col2_check".

Instead of using the default name for a single-column CHECK constraint, it is much better to give the constraint a specific name. CREATE TABLE tabOneColumnNamed ( Col1 INT CONSTRAINT Col1Constraint CHECK ( Col1 < 999 ) );

Limitations on CHECK Constraints

CHECK constraint can only refer to one row of a table. We can not use aggregation functions, because that would break such limitation.
ALTER TABLE tabProducts ADD CONSTRAINT OverFlow CHECK ( SUM( Qty ) < 1000 );
This also means that a CHECK constraint cannot refer to tables other than the table on which it is defined.

A simple check on a single column has minimal impact on performance. We should avoid complex check conditions.

CHECK Constraints and Nulls

INSERT INTO tabOneColumn ( Col1, Col2 ) VALUES ( null, null ); This INSERT statement will always succeed. CHECK constraints can not check null values.
SELECT * FROM tabOneColumn;

How to Delete CHECK Constraints?

We can delete constraints like this:
– We can delete just the CHECK constraint.
– We can delete the whole table. ALTER TABLE tabonecolumn DROP CONSTRAINT zzz; DROP TABLE tabOneColumnNamed; Now, our constraints are gone.

SELECT * FROM information_schema.check_constraints WHERE table_name = 'tabOneColumnNamed' or constraint_name = 'zzz';

LIKE and ILIKE Operators

LIKE is a pattern matching operator. It can help us to find patterns in a text. LIKE is based on two wildcard characters. Percent sign "%" will replace any set of consecutive characters. Underscore "_" will replace exactly one character. Here are some examples:

SELECT 'zzz' LIKE '%'; --true
SELECT 'zzz' LIKE '___'; --true SELECT 'Azz' LIKE 'A%'; --true
SELECT 'AzzA' LIKE '%zz%'; --true SELECT 'AzzA' LIKE 'A__A'; --true
SELECT 'Azz' LIKE 'Azz_'; --false

ILIKE is case insensitive version. SELECT 'fff' LIKE 'F_F'; --false SELECT 'fff' ILIKE 'F_F'; --true

We can negate LIKE with NOT. SELECT 'M' NOT LIKE 'M'; --false SELECT 'M' LIKE 'M'; --true

Default escape character is backslash "\".	`SELECT '%_' LIKE '\%\_';` `--`true
We can declare any other character to be ESCAPE sign.	`SELECT '%_' LIKE '#%#_' ESCAPE '#';` `--`true

Function regexp_replace

For more complex patterns, we can use the regexp_replace function.
This function accepts the original string, a search pattern, and a string
that will replace the found pattern. We will replace "m", followed by two letters, with the "z".

SELECT regexp_replace( 'maaam', 'm.{2}', 'z' );

This function is case sensitive,
except if we use the fourth,
optional argument. SELECT regexp_replace( 'maaam', 'M.{2}', 'z' );
-- not replaced SELECT regexp_replace( 'maaam', 'M.{2}', 'z', 'i' );
--replaced

This function also accepts other modifiers (flags). I tested that it will accept "m,i,s,x,xx". Flag "xx" is the same as "x".

It is interesting that it will not accept flag "g". It seems that this modifier is constanly turned on. SELECT regexp_replace( 'SSS', 'S', 'P' );

SELECT regexp_replace('first\nfirst', '^first', '*', 'm') AS result; SELECT regexp_replace('a\nb', 'a.b', 'X', 's') AS result; SELECT regexp_replace('abc123', ' 1 2 3 ', '', 'x') AS result; Here are examples that you can try with and without modifier (flag).

`SELECT regexp_replace('Prisca Gbaguidi', '\\w+\\s\\w+', 'Mireille Gbaguidi');` `--`Mireille Gbaguidi	Backslashes have to be escaped.
`SELECT regexp_replace('Prisca Gbaguidi', '(\\w+)(\\s)(\\w+)', '\\3 \\1');` `--`Gbaguidi Prisca	Regex with numbered capturing groups.

TRUNCATE and SERIAL Data Type

0510 JDBC, Recursive CTEs, New Functions in MonetDB

MonetDB / By Bizkapish / July 16, 2025 March 21, 2026

We will continue using the green and blue server that we have created in this post => link ( or you can watch the video on the youtube => link ). This was the post about distributed query processing. monetdbd start /home/sima/monetdb/DBfarmG mclient -u monetdb -d DatabaseG --password monetdb

monetdbd start /home/sima/monetdb/DBfarmB mclient -u monetdb -d DatabaseB --password monetdb

JDBC

We will now connect to the blue MonetDB server, from the green server, through JDBC connector.

Installing Java

Let's see if we have Java installed. `java -version`
We don't have it, so we can install it like this: `sudo apt install default-jre`

Connecting With the JDBC Client

From this link:
https://www.monetdb.org/downloads/Java/ Download this file "jdbcclient.jre8.jar".

This is JAR file that includes java console client application, but also the driver. It is all in one. Inside of the green server, run this command from the shell:

java -jar /home/sima/Desktop/jdbcclient.jre8.jar -h 192.168.100.146 -p 50000 -u monetdb -d DatabaseB

Using JDBC Connection with DBeaver

From this link:
https://www.monetdb.org/downloads/Java/ Download JDBC driver "monetdb-jdbc-12.0.jre8.jar".

We don't have DBeaver on the green server,
so we have to install it. We will download
DBeaver ".deb" file with wget command.
Then we can install it. cd /home/sima/Desktop wget https://dbeaver.io/files/dbeaver-ce_latest_amd64.deb
sudo apt install ./dbeaver-ce_latest_amd64.deb

Recursive Common Table Expressions

What is Recursion

Recursion is an iterative process of finding a solution. We repeat the same logic each time, but each time we are closer to the solution because we can base our logic on the information we have gained during previous iterations.

Let's look at this example. Our friend imagined a number between 1 and 10. We have to guess that number with the minimal number of questions with YES/NO answers. The best approach is to use binary logic, based on an elimination process where in each iteration we can remove half of the numbers.

Question:
Is it bigger than 5?

Answer:
Yes, it is.

Question:
Is it bigger than 8?

Answer:
No, it is not.

Question:
Is it smaller than 7?

Answer:
Yes, it is.

It has to be 6.

The only argument in our alghorithm is the range of the possible numbers.

Our logic has two steps:
1) Does the range of possible numbers has only one number.
2) If it doesn't, ask the question to eliminate half of the numbers and reduce the range of possible numbers by half.

We can pack the second step into function with a name "EliminateHalf". This function will return the range of all the possible numbers. We will call this function 3 times.

1) EliminateHalf (1,2,3,4,5,6,7,8,9,10) = (6,7,8,9,10)
2) EliminateHalf (6,7,8,9,10) = (6,7)
3) EliminateHalf (6,7) = (6)

We can nest these functions:
EliminateHalf(EliminateHalf(
EliminateHalf(
1,2,3,4,5,6,7,8,9,10))) = 6

Problem is that, sometimes, we don't know how many nested
functions do we need. I will create a pseudo code
that will nest as many functions as needed to get the final result. Result = 1,2,3,4,5,6,7,8,9,10 # initial stateDo Until Count( Result ) = 1 # are we finished
Result = EliminateHalf( Result ) # if not, continue Loop

This is RECURSION. We broke the complex problem into small steps. Each step has the same logic. Each step is using arguments that are the result of the previous step. This is just one iterative process which brings us closer to the solution with each step.

You want example from the real life. The coach of the football team analyze data from the previous game. After each analysis he change the game of his team. He continues with this practice until his team start winning.

The Structure of Recursion

Recursion structure always has four steps.
The first step "initial state" is a problem
that we want to solve. We will solve it by
improving our statistics.

1) Set initial values for our arguments.                                                     # initial state
2) Has our goal been achieved?                                                                # are we there yet            # recursive part
3) Improve our arguments by using some strategy.                              # continue with effort     # recursive part
4) Repeat steps 2 and 3 until we reach the goal.                                  # be persistent

Linear and Tree Recursion

In linear recursion we only have two possible outcomes. We are either satisfied with the result or we will continue with our effort. For example, the coach can be satisfied with his team or he can continue introducing improvements.

Tree Recursion is when we have several possible strategies to direct our effort. For example, the coach can change the team's game, or he can look for position in some other team. If we create a diagram of his possible actions we can get something like this:

Structure of The Recursive Common Table Expression

WITH RECURSIVE cte_name AS (
SELECT ... –initial state
UNION ALL

SELECT ... –continue with improvements
FROM cte_name –get the previous state
WHERE ... –are we there yet )
SELECT * FROM cte_name; –return result In the simplest form, recursive CTE has two SELECT statements connected with UNION ALL. First select statement will define initial state.

RECURSIVE CTE will return all interim results connected with UNION ALL.

Second select statement will calculate the new status. It will reference the previous status by the name of the CTE.
WHERE in second select statement will tell us when to stop.

Tree Recursion

Tree recursion occurs when ANCHOR and RECURSIVE members are select statements that return tables with several rows. Those rows represent folders at the same level. We have (1), (2), (3) for top (1), middle (2) and bottom (3) folders. First, we get top folders (initial state), then middle folders (first recursion), and then bottom folders (second recursion). Each recursion is used to collect folders from the level bellow.

Recursive CTEs Caveats

1) Recursive and anchor member must match in columns number and data types.

3) Don't use OUTER JOINS in the recursive member. The query will never end. Only INNER join is acceptable.

4) MonetDB will not complain if we use aggregate or window functions in recursive member. We can also use DISTINCT and GROUP BY in the recursive member. In MonetDB, we can use CTE's name in the FROM clause, but we can also use it in subquery. Some other servers don't allow this.

New Functions

DayName and MonthName Functions

`SELECT DAYNAME('2025-07-12');`	Saturday	This function returns a name of a day in a week according to the current locale, set in the OS.
`SELECT MONTHNAME('2025-07-12');`	July	This is similar function that is returning the name of a month.

Beside date arguments, we can also use timestamp ('1987-09-23 11:40') or timestamp TZ ('15:35:02.002345+01:00').

Generate Series Functions

SELECT * FROM generate_series(1,9,2); 1,3,5,7 This function will return numbers from 1 do 9, with step 2. Default step is 1.

`SELECT * FROM` `generate_series('2025-01-01','2025-01-10',INTERVAL '5' DAY);`	2025-01-01,2025-01-06	All dates from the range, but with the step of 5 days.
`SELECT * FROM generate_series('2025-01-01','2025-05-10',INTERVAL '2' MONTH);`	2025-01-01, 2025-03-01	We can also get the months with the step od 2 months.

We can list seconds or days between two timestamps.
`SELECT * FROM generate_series('2025-01-01 01:40:00','2025-01-01 1:40:05', INTERVAL '3' SECOND);`	2025-01-01 01:40:00 2025-01-01 01:40:03
`SELECT * FROM generate_series('2025-01-01 01:40:00','2025-01-06 1:40:05', INTERVAL '3' DAY);`	2025-01-01 01:40:00 2025-01-04 01:40:00

0500 Proto_loaders, ODBC and COPY in MonetDB

MonetDB / By Bizkapish / July 12, 2025 March 21, 2026

We will continue using the green and blue databases that we have created in this post => link ( or you can watch the video on the youtube => link ). This was the post about distributed query processing. monetdbd start /home/sima/monetdb/DBfarmG mclient -u monetdb -d DatabaseG --password monetdb

monetdbd start /home/sima/monetdb/DBfarmB mclient -u monetdb -d DatabaseB --password monetdb

Read From CSV File With a file_loader Function

We have three files and two ".gz" archives.

"CSV" file is using commas. It also has commas at end of the rows with data. This is the only file that doesn't have file format extension.
"TSV.tsv" file is using tab as delimiter, but it also has double quotes around the strings.
"PSV.psv" file is using pipes and has a null in the "Letter" column.
Files with file format extension ".gz"are just PSV file compressed.

Files must be placed on the server. Returned value of the "file_loader" function is virtual table. We don't specify delimiters, wrappers and data types for the files. They are deduced automatically. We can read from CSV, TSV and PSV files, and also ".gz,.lz4,.bz2 and .xz" files.

file_loader Function Syntax

SELECT * FROM file_loader( '/home/abhishek/sample.csv' ); The only argument of our function is the full path toward the file.

SELECT * FROM '/home/abhishek/sample.csv'; Shorter syntax is much better. We don't have to type the function name.

Experiments With the CSV File

We can not read from files that don't have file format extension.
SELECT * FROM '/home/sima/Desktop/CSV';

Experiments With the TSV File

We can try to trim double quotes, but our column will not be recognized.
SELECT TRIM( Letter, '"' ) FROM '/home/sima/Desktop/TSV.tsv';

Column names are case sensitive, so we have to place column names inside of the double quotes. Only then our query will work.
SELECT TRIM( "Letter", '"' ) FROM '/home/sima/Desktop/TSV.tsv';

Experiments With the PSV File

Experiments With the GZ files

If we compress our file as a "tar" tape archive format then file loader will not work.
SELECT * FROM '/home/sima/Desktop/PSV.psv.tar.gz';

Conclusion

We can conclude that file_loader function is not as versatile as a COPY INTO function, which is described in this blog post => link.

Read From Remote Database With a proto_loader Function

We have saw that we can login to MonetDB server that is on another computer. We also saw how we can create remote tables. This time we will see how to ad-hoc read tables that are on some other computer/server.

Testing Local MonetDB Server

Testing Remote MonetDB Server

More interesting thing is ability to read tables from the remote server. I will read table from the blue server ( before that please start the blue server ).

Creation Of a Remote Table

We can test whether we can create remote table using syntax that starts with "monetdb://". On the blue server I will change current schema and then I will create one table.

monetdbd start /home/sima/monetdb/DBfarmB mclient -u monetdb -d DatabaseB --password monetdb SET SCHEMA SchemaGB; CREATE TABLE Test( Number INT );

Connect to Any ODBC Database From MonetDB

ODBC Driver Manager

On the green server, we will install ODBC Driver Manager. First check if you alredy have it installed. Just type "odbcinst" in the shell.

MonetDB ODBC Driver

We can connect to any ODBC capable server, but we will use this opportunity to see how to connect to MonetDB server. We will use MonetDB ODBC driver to connect to the blue server. This is ODBC driver we need. We install it on the green server.

sudo apt install libmonetdb-client-odbc

Testing ODBC Driver

proto_loader Function For ODBC

But our goal is to use "proto_loader" function to directly fetch data into MonetDB server on the green computer, from the blue server, with ODBC. For that we will install one more package. sudo apt install monetdb-odbc-loader

Using ODBC loader is still experimental. This functionality is NOT turned on by default. We will now turn it on. First, we exit "DatabaseG". quit monetdb stop DatabaseG

We will now log in to our database. This will automatically start the server. During that, we will automatically load "odbc-loader" module. mclient -u monetdb -d DatabaseG --password monetdb

Using proto_loader Function For ODBC

It is also possible to provide all of the necessary parameters directly inside of the ODBC connection string:

SELECT * FROM proto_loader('odbc:DRIVER=/usr/lib/x86_64-linux-gnu/libMonetODBC.so;SERVER=192.168.100.146;PORT=50000;DATABASE=DatabaseB;UID=monetdb;PWD=monetdb;QUERY=SELECT * FROM schemagb.factb')

There is also a version that is using DSN file. This version is for Windows only.
odbc:FILEDSN=<data source name>;[<ODBC connection parameters>;]QUERY=<SQL query>

Virtual Tables

Virtual tables are tables that don't have data physically stored in MonetDB table. Virtual tables are views, merge tables, remote tables. Tables that we receive through file_loader and proto_loader functions are also virtual tables. We will now see how to transform file_loader and proto_loader virtual tables into more permanent structures.

CREATE TABLE Based on the Loader Function

We can use CREATE TABLE AS to store CSV file into new table:
CREATE TABLE permanentCSV ( Number, Letter ) AS ( SELECT * FROM '/home/sima/Desktop/CSV.csv' );

CREATE TEMPORARY TABLE Based on the Loader Function

CREATE LOCAL TEMPORARY TABLE temporaryFactB ( YearNum, Dates, ProdID, Qty ) AS
( SELECT * FROM proto_loader('odbc:DSN=DatabaseB;QUERY=SELECT * FROM schemagb.factb') )
WITH DATA ON COMMIT PRESERVE ROWS; We can also make a temporary table.

Bulk INSERT Based on the Loader Function

TRUNCATE temporaryFactB; We can pull data from any other ODBC capable server into our temporary FactB table (which is now empty).

INSERT INTO temporaryFactB ( YearNum, Dates, ProdID, Qty )

SELECT * FROM proto_loader('odbc:DRIVER=/usr/lib/x86_64-linux-gnu/libMonetODBC.so;SERVER=192.168.100.146;PORT=50000;DATABASE=DatabaseB;UID=monetdb;PWD=monetdb;QUERY=SELECT * FROM schemagb.factb');

COPY command

We already talked about COPY INTO and COPY FROM statements ( blog1 and blog2; youtube1 and youtube2 ). We will now see some special syntaxes of these commands.

COPY FROM stdin

We will first create one empty table. CREATE TABLE tabStdin( Number INT, Letter CHAR );

COPY INTO stdout

COPY FROM Csv, With DECIMAL Clause

COPY OFFSET 2 INTO tabDecimal
FROM '/home/sima/Desktop/CSV_file'( Number, Letter ) DECIMAL AS '*','_'; With DECIMAL clause we can specify what decimal point and thousands separator, our CSV has.

Option "-f" means that we will only backup functions. `msqldump -d voc -f > /home/fffovde/Desktop/voc_db.sql`
With option "-t", we can backup only one table. We should provide fully qualified name of a table, because default schema for administrator is sys ( we are logged in as administrator ). `msqldump -d voc -t voc.invoices > /home/fffovde/Desktop/voc_db.sql`
With wild cards we can backup a set of tables with a similar name. This code bellow would return only tables "passengers, seafarers and soldiers". `msqldump -d voc -t voc.%ers > /home/fffovde/Desktop/voc_db.sql`
With upper letter "-D voc", we will export database without data. `msqldump -D voc > /home/fffovde/Desktop/voc_db.sql`

I will now jump into green server. I explained how to create green server in this blog post "link". I am using this server because that server is set to accept external connections. Green server has IP address "192.168.100.145" and database DatabaseG.
I will run this command on the green server, so I am just doing backup locally. `msqldump -h 192.168.100.145 -p 50000 -u monetdb -d DatabaseG -o /home/sima/Desktop/DatabaseG_db.sql`
I will now run the same command from the "voc" server. This will backup "DatabaseG", from the green server, onto "voc" server. `msqldump -h 192.168.100.145 -p 50000 -u monetdb -d DatabaseG -o /home/fffovde/Desktop/DatabaseG_db.sql`
In the previous blogpost "link", I have created purple server, which is protected with TLS. IP address of this server is 192.168.100.152, and its database is "tlsDB". I will now run this command on the "voc" server to backup database from the purple server. `msqldump -u monetdb -d 'monetdbs://192.168.100.152:50000/tlsDB?cert=/home/fffovde/selfsigned.crt' -o /home/fffovde/Desktop/tlsDB_db.sql`