MonetDB Archives - Page 3 of 12

0520 CHECK, RETURNING and Other in MonetDB

MonetDB / By Bizkapish / July 19, 2025 July 27, 2025

Sample Table

CREATE TABLE tabProducts (
Color VARCHAR(10),
Size VARCHAR(5),
Qty INT ); INSERT INTO tabProducts (Color, Size, Qty) VALUES
('Red', 'M', 10),
('Red', 'XXL', 10),
('Blue', 'XL', 30);

RETURNING Clause

Only SELECT statement returns some values. INSERT, UPDATE, DELETE just silently do their job, without any feedback.

If we execute statement:
UPDATE t1 SET Col1 = 'zzz' WHERE Id = 99; We can afterward check rows that are updated:
SELECT * FROM t1 WHERE Id = 99; If we could do both things with one statement that would simplify things and reduce the strain on a database.

We can do the same thing with DELETE and UPDATE. Delete will return deleted rows, and UPDATE will return updated rows.

We can use expressions in the RETURNING clause.

INSERT INTO tabProducts VALUES( 'Blue', 'M', 100 ) RETURNING Color, SUM( Qty ) GROUP BY Color; This will not work. We can not use GROUP BY in the RETURNING clause. RETURNING clause must remain simple.

Referencing Columns by Their Position

Referencing Set of Columns with the Keyword ALL

Instead of the keyword ALL, we can use the star "*" sign. SELECT Color, qty, COUNT( Size )
FROM tabProducts GROUP BY *; –BY Color, qty SELECT Color, Size, qty
FROM tabProducts ORDER BY *; –BY Color, Size, qty

IS [NOT] DISTINCT FROM

Anything compared with NULL will return NULL. SELECT 'null' = null; SELECT null = null; IS [NOT] DISTINCT FROM is a null-safe comparison operator.
This operator will always return TRUE or FALSE,
even if one of operands is NULL.

`SELECT NULL IS DISTINCT FROM NULL;`		`SELECT 'A' IS DISTINCT FROM NULL;`
`SELECT NULL IS NOT DISTINCT FROM NULL;`		`SELECT 'A' IS NOT DISTINCT FROM NULL;`

CHECK Constraint

A check constraint is a type of rule which specifies a condition ( boolean expression ) that must be met by each row in a database table. This rule limits acceptable values for data.

If we change our condition, so that qty must be less than 100, then it will succeed.
ALTER TABLE tabProducts ADD CONSTRAINT "QtyLess100" CHECK (qty < 100);
After that, we will try to insert number 111 into qty column => INSERT INTO tabProducts( Color, Size, Qty ) VALUES ( 'Blue', 'XL', 111 );

This will fail because of the constraint (111>100).

UPDATE tabProducts SET qty = 111 WHERE qty = 30; This also mean that we can not update the value in the qty column to a value that is bigger than 100.

How to Add Check Constraint?

We can add several constraints on the same column. We now have two constraints, that "qty > 0" and "qty < 100".
ALTER TABLE tabProducts ADD CONSTRAINT "QtyGrt0" CHECK (qty > 0);

That is not efficient. It is much better to add both constraints with one statement. We can connect conditions with AND, OR.
ALTER TABLE tabProducts ADD CONSTRAINT QtyConstraints CHECK (qty > 0 AND qty < 100);

Constraints can combine several columns in the requirement expression:
ALTER TABLE tabProducts ADD CONSTRAINT CheckColorSize CHECK ( Color = 'Red' OR Size = 'XL' );

We can add constraint during the creation of a table.
CREATE TABLE tabOrders ( Price INT, Qty INT, CONSTRAINT ValidOrders CHECK ( Price > 10 AND Qty < 10 ) );

During table creation, we can add a constraint that is at the single column level. The server will provide a default name for such a constraint. CREATE TABLE tabOneColumn ( Col1 CHAR, Col2 INT CHECK ( Col2 < 999 ) );
Default name consists of the table name, the column name, and the keyword "check". In this example, that would be "tabonecolumn_col2_check".

Instead of using the default name for a single-column CHECK constraint, it is much better to give the constraint a specific name. CREATE TABLE tabOneColumnNamed ( Col1 INT CONSTRAINT Col1Constraint CHECK ( Col1 < 999 ) );

Limitations on CHECK Constraints

CHECK constraint can only refer to one row of a table. We can not use aggregation functions, because that would break such limitation.
ALTER TABLE tabProducts ADD CONSTRAINT OverFlow CHECK ( SUM( Qty ) < 1000 );
This also means that a CHECK constraint cannot refer to tables other than the table on which it is defined.

A simple check on a single column has minimal impact on performance. We should avoid complex check conditions.

CHECK Constraints and Nulls

INSERT INTO tabOneColumn ( Col1, Col2 ) VALUES ( null, null ); This INSERT statement will always succeed. CHECK constraints can not check null values.
SELECT * FROM tabOneColumn;

How to Delete CHECK Constraints?

We can delete constraints like this:
– We can delete just the CHECK constraint.
– We can delete the whole table. ALTER TABLE tabonecolumn DROP CONSTRAINT zzz; DROP TABLE tabOneColumnNamed; Now, our constraints are gone.

SELECT * FROM information_schema.check_constraints WHERE table_name = 'tabOneColumnNamed' or constraint_name = 'zzz';

LIKE and ILIKE Operators

LIKE is a pattern matching operator. It can help us to find patterns in a text. LIKE is based on two wildcard characters. Percent sign "%" will replace any set of consecutive characters. Underscore "_" will replace exactly one character. Here are some examples:

SELECT 'zzz' LIKE '%'; –true
SELECT 'zzz' LIKE '___'; –true SELECT 'Azz' LIKE 'A%'; –true
SELECT 'AzzA' LIKE '%zz%'; –true SELECT 'AzzA' LIKE 'A__A'; –true
SELECT 'Azz' LIKE 'Azz_'; –false

ILIKE is case insensitive version. SELECT 'fff' LIKE 'F_F'; –false SELECT 'fff' ILIKE 'F_F'; –true

We can negate LIKE with NOT. SELECT 'M' NOT LIKE 'M'; –false SELECT 'M' LIKE 'M'; –true

Default escape character is backslash "\".	`SELECT '%_' LIKE '\%\_';` –true
We can declare any other character to be ESCAPE sign.	`SELECT '%_' LIKE '#%#_' ESCAPE '#';` –-true

Function regexp_replace

For more complex patterns, we can use the regexp_replace function. This function accepts the original string, a search pattern, and a string that will replace the found pattern. We will replace "m", followed by two letters, with the "z".

SELECT regexp_replace( 'maaam', 'm.{2}', 'z' );

This function is case sensitive,
except if we use the forth,
optional argument. SELECT regexp_replace( 'maaam', 'M.{2}', 'z' );
— false SELECT regexp_replace( 'maaam', 'M.{2}', 'z', 'i' );
–true

This function also accepts other modifiers (flags). I tested that it will accept "m,i,s,x,xx". Flag "xx" is the same as "x".

It is interesting that it will not accept flag "g". It seems that this modifier is constanly turned on. SELECT regexp_replace( 'SSS', 'S', 'P' );

SELECT regexp_replace('first\nfirst', '^first', '*', 'm') AS result; SELECT regexp_replace('a\nb', 'a.b', 'X', 's') AS result; SELECT regexp_replace('abc123', ' 1 2 3 ', '', 'x') AS result; Here are examples that you can try with and without modifier (flag).

`SELECT regexp_replace('Prisca Gbaguidi', '\\w+\\s\\w+', 'Mireille Gbaguidi');` –Mireille Gbaguidi	Backslashes have to be escaped.
`SELECT regexp_replace('Prisca Gbaguidi', '(\\w+)(\\s)(\\w+)', '\\3 \\1');` —Gbaguidi Prisca	Regex with numbered capturing groups.

TRUNCATE and SERIAL Data Type

0510 JDBC, Recursive CTEs, New Functions in MonetDB

MonetDB / By Bizkapish / July 16, 2025 September 18, 2025

We will continue using the green and blue server that we have created in this post => link ( or you can watch the video on the youtube => link ). This was the post about distributed query processing. monetdbd start /home/sima/monetdb/DBfarmG mclient -u monetdb -d DatabaseG –password monetdb

monetdbd start /home/sima/monetdb/DBfarmB mclient -u monetdb -d DatabaseB –password monetdb

JDBC

We will now connect to the blue MonetDB server, from the green server, through JDBC connector.

Installing Java

Let's see if we have Java installed. `java -version`
We don't have it, so we can install it like this: `sudo apt install default-jre`

Connecting With the JDBC Client

From this link:
https://www.monetdb.org/downloads/Java/ Download this file "jdbcclient.jre8.jar".

This is JAR file that includes java console client application, but also the driver. It is all in one. Inside of the green server, run this command from the shell:

java -jar /home/sima/Desktop/jdbcclient.jre8.jar -h 192.168.100.146 -p 50000 -u monetdb -d DatabaseB

Using JDBC Connection with DBeaver

From this link:
https://www.monetdb.org/downloads/Java/ Download JDBC driver "monetdb-jdbc-12.0.jre8.jar".

We don't have DBeaver on the green server,
so we have to install it. We will download
DBeaver ".deb" file with wget command.
Then we can install it. cd /home/sima/Desktop wget https://dbeaver.io/files/dbeaver-ce_latest_amd64.deb
sudo apt install ./dbeaver-ce_latest_amd64.deb

Recursive Common Table Expressions

In the blog post about common table expressions ( link ), I unintentionally lied that MonetDB doesn't support Recursive CTEs. But it does, and I will explain them now.

What is Recursion

Recursion is an iterative process of finding a solution. We repeat the same logic each time, but each time we are closer to the solution because we can base our logic on the information we have gained during previous iterations.

Let's look at this example. Our friend imagined a number between 1 and 10. We have to guess that number with the minimal number of questions. The best approach is to use binary logic, based on an elimination process where in each iteration we can remove half of the numbers.

Question:
Is it bigger than 5?

Answer:
Yes, it is.

Question:
Is it bigger than 8?

Answer:
No, it is not.

Question:
Is it smaller than 7?

Answer:
Yes, it is.

It has to be 6.

The only argument in our alghorithm is the range of the possible numbers.

Our logic has two steps:
1) Does the range of possible numbers has only one number.
2) If it doesn't, ask the question to eliminate half of the numbers and reduce the range of possible numbers by half.

We can pack the second step into function with a name "EliminateHalf". This function will return the range of all the possible numbers. We will call this function 3 times.

1) EliminateHalf (1,2,3,4,5,6,7,8,9,10) = (6,7,8,9,10)
2) EliminateHalf (6,7,8,9,10) = (6,7)
3) EliminateHalf (6,7) = (6)

We can nest these functions:
EliminateHalf(EliminateHalf(
EliminateHalf(
1,2,3,4,5,6,7,8,9,10))) = 6

Problem is that we don't know how many nested
functions do we need. I will create a pseudo code
that will nest as many functions as needed to get the final result. Result = EliminateHalf( 1,2,3,4,5,6,7,8,9,10 ) # initial stateDo Until Count( Result ) = 1 # are we finished
Result = EliminateHalf( Result ) # if not, continue Loop

This is RECURSION. We broke the complex problem into small steps. Each step has the same logic. Each step is using arguments that are the result of the previous step. This is just one iterative process which brings us closer to the solution with each step.

You want example from the real life. The coach of the football team analyze data from the previous game. After each analysis he change the game of his team. He continues with this practice until his team start winning.

The Structure of Recursion

Recursion structure always has four steps.
The first step "initial state" is a problem
that we want to solve. We will solve it by
improving our statistics.

1) Set initial values for our arguments.                                                     # initial state
2) Has our goal been achieved?                                                                 # are we there yet            # recursive part
3) Improve our arguments by using some strategy.                             # continue with effort     # recursive part
4) Repeat steps 2 and 3 until we reach the goal.                                   # be persistent

Linear and Tree Recursion

In linear recursion we only have two possible outcomes. We are either satisfied with the result or we will continue with our effort. For example, the coach can be satisfied with his team or he can continue introducing improvements.

Tree Recursion is when we have several possible strategies to direct our effort. For example, the coach can change the team's game, or he can look for position in some other team. If we create a diagram of his possible actions we can get something like this:

Structure of The Recursive Common Table Expression

WITH RECURSIVE cte_name AS (
SELECT ... –initial state
UNION ALL

SELECT ... –continue with improvements
FROM cte_name –get the previous state
WHERE ... –are we there yet )
SELECT * FROM cte_name; –return result In the simplest form, recursive CTE has two SELECT statements connected with UNION ALL. First select statement will define initial state.

RECURSIVE CTE will return all interim results connected with UNION ALL.

Second select statement will calculate the new status. It will reference the previous status by the name of the CTE.
WHERE in second select statement will tell us when to stop.

Tree Recursion

Tree recursion occurs then ANCHOR and RECURSIVE members are select statements that return tables with several rows. Those rows represent folders at the same level. We have (1), (2), (3) for top (1), middle (2) and bottom (3) folders. First, we get top folders (initial state), then middle folders (first recursion), and then bottom folders (second recursion). Each recursion is used to collect folders from the level bellow.

Recursive CTEs Caveats

1) Recursive and anchor member must match in columns number and data types.

3) Don't user OUTER JOINS in the recursive member. The query will never end. Only INNER join is acceptable.

4) MonetDB will not complain if we use aggregate or window functions in recursive member. We can also use DISTINCT and GROUP BY in the recursive member. In MonetDB, we can use CTE's name in the FROM clause, but we can also use it in subquery. Some other servers don't allow this.

New Functions

DayName and MonthName Functions

SELECT DAYNAME('2025-07-12');	Saturday	This function returns a name of a day in a week according to the current locale, set in the OS.
SELECT MONTHNAME('2025-07-12');	July	This is similar function that is returning the name of a month.

Beside date arguments, we can also use timestamp ('1987-09-23 11:40') or timestamp TZ ('15:35:02.002345+01:00').

Generate Series Functions

SELECT * FROM generate_series(1,9,2); 1,3,5,7 This function will return numbers from 1 do 9, with step 2. Default step is 1.

`SELECT * FROM` `generate_series('2025-01-01','2025-01-10',INTERVAL '5' DAY);`	2025-01-01,2025-01-06	All dates from the range, but with the step of 5 days.
`SELECT * FROM generate_series('2025-01-01','2025-05-10',INTERVAL '2' MONTH);`	2025-01-01, 2025-03-01	We can also get the months with the step od 2 months.

We can list seconds or days between two timestamps.
`SELECT * FROM generate_series('2025-01-01 01:40:00','2025-01-01 1:40:05', INTERVAL '3' SECOND);`	2025-01-01 01:40:00 2025-01-01 01:40:03
`SELECT * FROM generate_series('2025-01-01 01:40:00','2025-01-06 1:40:05', INTERVAL '3' DAY);`	2025-01-01 01:40:00 2025-01-04 01:40:00

0500 Proto_loaders, ODBC and COPY in MonetDB

MonetDB / By Bizkapish / July 12, 2025 July 22, 2025

We will continue using the green and blue databases that we have created in this post => link ( or you can watch the video on the youtube => link ). This was the post about distributed query processing. monetdbd start /home/sima/monetdb/DBfarmG mclient -u monetdb -d DatabaseG –password monetdb

monetdbd start /home/sima/monetdb/DBfarmB mclient -u monetdb -d DatabaseB –password monetdb

Read From CSV File With a file_loader Function

We have three files and two ".gz" archives.

"CSV" file is using commas. It also has commas at end of the rows with data. This is the only file that doesn't have file format extension.
"TSV.tsv" file is using tab as delimiter, but it also has double quotes around the strings.
"PSV.psv" file is using pipes and has a null in the "Letter" column.
Files with file format extension ".gz"are just PSV file compressed.

Files must be placed on the server. Returned value of the "file_loader" function is virtual table. We don't specify delimiters, wrappers and data types for the files. They are deduced automatically. We can read from CSV, TSV and PSV files, and also ".gz,.lz4,.bz2 and .xz" files.

file_loader Function Syntax

SELECT * FROM file_loader( '/home/abhishek/sample.csv' ); The only argument of our function is the full path toward the file.

SELECT * FROM '/home/abhishek/sample.csv'; Shorter syntax is much better. We don't have to type the function name.

Experiments With the CSV File

We can not read from files that don't have file format extension.
SELECT * FROM '/home/sima/Desktop/CSV';

Experiments With the TSV File

We can try to trim double quotes, but our column will not be recognized.
SELECT TRIM( Letter, '"' ) FROM '/home/sima/Desktop/TSV.tsv';

Column names are case sensitive, so we have to place column names inside of the double quotes. Only then our query will work.
SELECT TRIM( "Letter", '"' ) FROM '/home/sima/Desktop/TSV.tsv';

Experiments With the PSV File

Experiments With the GZ files

If we compress our file as a "tar" tape archive format then file loader will not work.
SELECT * FROM '/home/sima/Desktop/PSV.psv.tar.gz';

Conclusion

We can conclude that file_loader function is not as versatile as a COPY INTO function, which is described in this blog post => link.

Read From Remote Database With a proto_loader Function

We have saw that we can login to MonetDB server that is on another computer. We also saw how we can create remote tables. This time we will see how to ad-hoc read tables that are on some other computer/server.

Testing Local MonetDB Server

Testing Remote MonetDB Server

More interesting thing is ability to read tables from the remote server. I will read table from the blue server ( before that please start the blue server ).

Creation Of a Remote Table

We can test whether we can create remote table using syntax that starts with "monetdb://". On the blue server I will change current schema and then I will create one table.

monetdbd start /home/sima/monetdb/DBfarmB mclient -u monetdb -d DatabaseB –password monetdb SET SCHEMA SchemaGB; CREATE TABLE Test( Number INT );

Connect to Any ODBC Database From MonetDB

ODBC Driver Manager

On the green server, we will install ODBC Driver Manager. First check if you alredy have it installed. Just type "odbcinst" in the shell.

MonetDB ODBC Driver

We can connect to any ODBC capable server, but we will use this opportunity to see how to connect to MonetDB server. We will use MonetDB ODBC driver to connect to the blue server. This is ODBC driver we need. We install it on the green server.

sudo apt install libmonetdb-client-odbc

Testing ODBC Driver

proto_loader Function For ODBC

But our goal is to use "proto_loader" function to directly fetch data into MonetDB server on the green computer, from the blue server, with ODBC. For that we will install one more package. sudo apt install monetdb-odbc-loader

Using ODBC loader is still experimental. This functionality is NOT turned on by default. We will now turn it on. First, we exit "DatabaseG". quit monetdb stop DatabaseG

We will now log in to our database. This will automatically start the server. During that, we will automatically load "odbc-loader" module. mclient -u monetdb -d DatabaseG –password monetdb

Using proto_loader Function For ODBC

It is also possible to provide all of the necessary parameters directly inside of the ODBC connection string:

SELECT * FROM proto_loader('odbc:DRIVER=/usr/lib/x86_64-linux-gnu/libMonetODBC.so;SERVER=192.168.100.146;PORT=50000;DATABASE=DatabaseB;UID=monetdb;PWD=monetdb;QUERY=SELECT * FROM schemagb.factb')

There is also a version that is using DSN file. This version is for Windows only.
odbc:FILEDSN=<data source name>;[<ODBC connection parameters>;]QUERY=<SQL query>

Virtual Tables

Virtual tables are tables that don't have data physically stored in MonetDB table. Virtual tables are views, merge tables, remote tables. Tables that we receive through file_loader and proto_loader functions are also virtual tables. We will now see how to transform file_loader and proto_loader virtual tables into more permanent structures.

CREATE TABLE Based on the Loader Function

We can use CREATE TABLE AS to store CSV file into new table:
CREATE TABLE permanentCSV ( Number, Letter ) AS ( SELECT * FROM '/home/sima/Desktop/CSV.csv' );

CREATE TEMPORARY TABLE Based on the Loader Function

CREATE LOCAL TEMPORARY TABLE temporaryFactB ( YearNum, Dates, ProdID, Qty ) AS
( SELECT * FROM proto_loader('odbc:DSN=DatabaseB;QUERY=SELECT * FROM schemagb.factb') )
WITH DATA ON COMMIT PRESERVE ROWS; We can also make a temporary table.

Bulk INSERT Based on the Loader Function

TRUNCATE temporaryFactB; We can pull data from any other ODBC capable server into our temporaryFactB table (which is now empty).

INSERT INTO temporaryFactB ( YearNum, Dates, ProdID, Qty )

SELECT * FROM proto_loader('odbc:DRIVER=/usr/lib/x86_64-linux-gnu/libMonetODBC.so;SERVER=192.168.100.146;PORT=50000;DATABASE=DatabaseB;UID=monetdb;PWD=monetdb;QUERY=SELECT * FROM schemagb.factb');

COPY command

We already talked about COPY INTO and COPY FROM statements ( blog1 and blog2; youtube1 and youtube2 ). We will now see some special syntaxes of these commands.

COPY FROM stdin

We will first create one empty table. CREATE TABLE tabStdin( Number INT, Letter CHAR );

COPY INTO stdout

COPY FROM Csv, With DECIMAL Clause

COPY OFFSET 2 INTO tabDecimal
FROM '/home/sima/Desktop/CSV_file'( Number, Letter ) DECIMAL AS '*','_'; With DECIMAL clause we can specify what decimal point and thousands separator, our CSV has.

0490 Grouping Sets and Comments in MonetDB

MonetDB / By Bizkapish / June 29, 2025 July 5, 2025

Sample Table

CREATE TABLE tabSales( Continent VARCHAR(20), Subcontinent VARCHAR(20), Country VARCHAR(20),

                       State     VARCHAR(30), Sales   INT                                          );

The Problem

UNION ALL solution is bad for several reasons:
1) We have three queries to execute and then to combine multiple result sets into one.
2) It is hard to read and modify long UNION ALL query.
3) We have to be careful to properly align columns.

This is the problem that can be solved by grouping sets.

Grouping Sets

It is now clear that each element inside GROUPING SETS is a separate definition of a group. Each group can be defined by one column > Continent <, or by several columns placed inside of the parentheses ( Continent, Subcontinent ).

These two examples, that would return the same result, show the logic and brevity of the grouping sets. SELECT Col1, Col2, SUM( Sales ) FROM Table GROUP BY GROUPING SETS ( ( Col1, Col2 ), Col1 ); SELECT Col1, Col2, Sales FROM Table UNION ALL
SELECT Col1, null, SUM( Sales )
FROM Table
GROUP BY Col1;

Rollup

CUBE

Addition and Multiplication in Grouping Sets

This is addition:
( a, b ) + ( c ) = ( a, b ) ( c ) This is multiplication. Multiplication is crossjoin between individual values.
a1bc1 ( a ) ( b, c ) a1bc2 a1 * bc1 = a1bc3 a2 bc2 a2bc1 bc3 a2bc2 a2bc3

So, if we create GROUPING SETS like this, this will be addition.
GROUPING SETS ( Continent, ROLLUP( Continent, Subcontinent ), CUBE( Country, State ), () )

This is a syntax for multiplication. This time we will have commas between GROUPING SETS, ROLLUPS and CUBES, and individual elements.
GROUPING SETS ( Continent ), ROLLUP( Continent, Subcontinent ), CUBE( Country, State ), (),Country

Indicator Function – GROUPING

Formatting with COALESCE and Sort

With GROUPING function, we can create columns that will help us to sort the table.

These auxiliary columns ( SubcSort and StateSort ) can be easily eliminated by wrapping everything with "SELECT Subcontinent, State, Sales".

Comments

Sample Table and Function

Let's create two tables and function.

CREATE TABLE tabComment( Number INTEGER ); CREATE TEMPORARY TABLE tabTemporary( Number INTEGER ); CREATE OR REPLACE FUNCTION funcComment( Arg1 INTEGER ) RETURNS INTEGER BEGIN RETURN 2; END;

Comments on Database Objects

We can create comments that are tied for database objects. Comments convey information about that object. COMMENT ON TABLE tabComment IS 'tabComment description';
COMMENT ON COLUMN tabComment.Number IS 'Number column description';
COMMENT ON FUNCTION funcComment IS 'funcComment description'; COMMENT ON SCHEMA sys IS 'sys schema description';

Deleting a Comment

If we delete an object, its comment will be deleted.
DROP TABLE tabComment;
SELECT * FROM sys.comments WHERE Id = 15876;

We can delete a comment by setting it to NULL or an empty string.
COMMENT ON SCHEMA sys IS null; SELECT * FROM sys.comments WHERE Id = 2000;

If a function is overloaded then we have to provide the full signature.
COMMENT ON FUNCTION funcComment( INTEGER ) IS ''; SELECT * FROM sys.comments WHERE Id = 15881;

Persistent Database Objects

There are other database objects that we can place a comment on. They are all persistent database objects. COMMENT ON VIEW view_name IS 'Comment'; COMMENT ON INDEX index_name IS 'Comment'; COMMENT ON SEQUENCE sequence_name IS 'Comment'; COMMENT ON PROCEDURE procedure_name IS 'Comment';
COMMENT ON AGGREGATE aggregate_name IS 'Comment'; COMMENT ON LOADER loader_name IS 'Comment';

We can not create a comment on a temporary object.
COMMENT ON TABLE tabTemporary IS 'tabTemporary description';

0480 Distributed Queries in MonetDB

MonetDB / By Bizkapish / June 23, 2025 June 29, 2025

Note: Before reading this blog post, you should read article about merge tables in the MonetDB ( article ), or you can watch on youtube ( video ).
Advice: I suggest you follow this blog as a strict instruction. Any freestyling could mean that you will have problems to create remote tables.

Visual Presentation

The next step is to create remote tables. Remote tables are references on one server to tables on another server. The purpose of remote tables is to let each server know about all of the tables in the system and how those tables are connected. This allows each server to create a query execution plan, even though that query uses tables from both servers.

Green server	Blue server
Table: DimG Remote table: DimB Table: FactG Remote table: FactB	Table: DimB Remote table: DimG Table: FactB Remote table: FactG

Green Server Setup

mkdir /home/sima/monetdb/DBfarmG –create monetdb folder monetdbd create /home/sima/monetdb/DBfarmG –initialize monetdb folder
monetdbd start /home/sima/monetdb/DBfarmG –start monetdbd deamon monetdb create DatabaseG –create database "DatabaseG"monetdb release DatabaseG –make "DatabaseG" available mclient -u monetdb -d DatabaseG –log as administrator to "DatabaseG" (pass="monetdb") In this part, we will create MonetDB folder,
one database, and we will login as administrator.

As admin we will create new role, schema and user. Schema will have that role as authorization. User will have that schema as default schema, and role as default role. That means that user will be able to use all of the objects in this schema.

CREATE ROLE RoleGB; 
CREATE SCHEMA SchemaGB AUTHORIZATION RoleGB; 
CREATE USER UserGB WITH PASSWORD 'gb' NAME 'Distributed User' SCHEMA SchemaGB DEFAULT ROLE RoleGB; 
quit

Letters "GB" mean that this role, schema and user will exist in both the green and the blue server. In order for one user to access tables from both servers, his account must be present in both servers.

mclient -u usergb -d DatabaseG –be careful about upper and lower letters CREATE TABLE FactG ( YearNum INT, Dates DATE, Prodid INT, Qty INT ); INSERT INTO FactG VALUES ( 2025, '2025-01-01', 11, 5), ( 2025, '2025-01-02', 11, 10) , ( 2025, '2025-01-03', 22, 15), ( 2025, '2025-01-04', 22, 20); CREATE TABLE DimG( ProdID INT, ProdName VARCHAR(50) ); INSERT INTO DimG VALUES (11, 'product11'), (22, 'product22'), (33, 'product33'); quit Then, we will login as a new user "usergb".
We will create two tables, "DimG" and "FactG",
and we will fill them with the data.

At the start of this blog post, we already saw
content of these tables.

We can test who is listening the port 50.000.
ss -tulnp | grep 50000

Below you can see some commands that you can use to configure your firewall so that you can control communication between servers. I will not teach you how to manage the firewall. I will just show you how I configured it. If you have any problems with the firewall, you can reset all the rules with "sudo ufw reset", and then you can disable the firewall with "sudo ufw disable".

Testing Remote Access to the Green Server

Blue Server Setup

There is nothing different in preparing of the blue server. I will just repeat the same commands, but with different identifiers.

mkdir /home/sima/monetdb/ –create monetdb folder
monetdbd create /home/sima/monetdb/DBfarmB –initialize monetdb folder


monetdbd start /home/sima/monetdb/DBfarmB

–start monetdbd deamon

 
monetdb create DatabaseB

–create database "DatabaseB"

 
monetdb release DatabaseB

–make "DatabaseB" available


mclient -u monetdb -d DatabaseB

–log as administrator to "DatabaseB"

CREATE ROLE RoleGB; –all identifiers are the same for the privileges, as for the green server


CREATE SCHEMA SchemaGB AUTHORIZATION RoleGB; 
CREATE USER UserGB WITH PASSWORD 'gb' NAME 'Distributed User' SCHEMA SchemaGB DEFAULT ROLE RoleGB

;
quit;

mclient -u usergb -d DatabaseB –now we login as the user
CREATE TABLE FactB ( YearNum INT, Dates DATE, Prodid INT, Qty INT );

INSERT INTO FactB VALUES (2026, '2026-01-01', 11, 105), (2026, '2026-01-02', 11, 110)

, (2026, '2026-01-03', 33, 115), (2026, '2026-01-04', 33, 120);
CREATE TABLE DimB ( ProdID INT, ProdName VARCHAR(50) );
INSERT INTO DimB VALUES (11, 'product11'), (22, 'product22'), (33, 'product33');
quit

monetdbd stop /home/sima/monetdb/DBfarmB         
monetdbd get all /home/sima/monetdb/DBfarmB 
monetdbd set listenaddr=0.0.0.0 /home/sima/monetdb/DBfarmB

–we make the server available from the network


monetdbd start /home/sima/monetdb/DBfarmB

ss -tulnp | grep 50000

sudo ufw enable –if needed, we can set firewall


sudo ufw default deny incoming 
sudo ufw default allow outgoing 
sudo ufw allow 50000/tcp 
sudo ufw status

We can login to blue server from the green server (password="gb").
mclient -h 192.168.100.146 -p 50000 -u usergb -d DatabaseB

Preparing REMOTE, REPLICA and MERGE Tables in the Green Server

In the green server, we will create REMOTE tables that are just references toward tables in the blue server.
mclient -u monetdb -d DatabaseG –only admin can create REMOTE tables (pass="monetdb")
SET SCHEMA SchemaGB; –we must be in the same schema as the physical tables in the blue server

  
CREATE REMOTE TABLE DimB( ProdID INT, ProdName VARCHAR(50) ) on 'mapi:monetdb://192.168.100.146:50000/DatabaseB';

–ip address of blue server

   
CREATE REMOTE TABLE FactB( YearNum INT, Dates DATE, ProdID INT, Qty INT ) on 'mapi:monetdb://192.168.100.146:50000/DatabaseB';

CREATE REPLICA TABLE Dim( prodid INT, prodname VARCHAR(50) ); ALTER TABLE Dim ADD TABLE DimG; ALTER TABLE Dim ADD TABLE DimB; Tables "DimG" and "DimB" are totally identical. We must notify MonetDB that they are "replicas". For queries MonetDB can use any of these tables interchangeably.

CREATE MERGE TABLE Fact( YearNum INT, Dates DATE, ProdID INT, Qty INT );
ALTER TABLE Fact ADD TABLE FactG; ALTER TABLE Fact ADD TABLE FactB; Tables "FactG" and "FactB" are just partitions of the merge table. This is why you should read/watch video about merge tables.

Preparing REMOTE, REPLICA and MERGE Tables in the Blue Server

I will again repeat all of the steps, but this time for the blue server.

mclient -u monetdb -d DatabaseB –only admin can create REMOTE tables. (pass="monetdb")
SET SCHEMA SchemaGB; –we must be in the same schema as the physical tables in the blue server

 
CREATE REMOTE TABLE DimG( ProdID INT, ProdName VARCHAR(50) ) on 'mapi:monetdb://192.168.100.145:50000/DatabaseG';   
CREATE REMOTE TABLE FactG( YearNum INT, Dates DATE, ProdID INT, Qty INT ) on 'mapi:monetdb://192.168.100.145:50000/DatabaseG';

CREATE REPLICA TABLE Dim( ProdID INT, ProdName VARCHAR(50) );
ALTER TABLE Dim ADD TABLE DimG;
ALTER TABLE Dim ADD TABLE DimB; For performance reasons, MonetDB does not check if the replica tables are indeed identical. This responsibility is left to the database users.

CREATE MERGE TABLE Fact( YearNum INT, Dates DATE, ProdID INT, Qty INT ); ALTER TABLE Fact ADD TABLE FactG; ALTER TABLE Fact ADD TABLE FactB;

System Tables

The Fruit of Our Labor

Load Balancer

The question is how to divide users between two (or more) servers to equalize the load. The easiest way is to have all users born from January to June use the green server, and all users born from July to December use the blue server.

A more professional way is to use a load balancer, something like HAproxy (link). This program will direct users to a server that is currently not under load.