0510 JDBC, Recursive CTEs, New Functions in MonetDB     

We will continue using the green and blue databases that we have created in this post => link ( or you can watch the video on the youtube  => link ). This was the post about distributed query processing.monetdbd start /home/sima/monetdb/DBfarmG
mclient -u monetdb -d DatabaseG   
–password monetdb

monetdbd start /home/sima/monetdb/DBfarmB
mclient -u monetdb -d DatabaseB   
–password monetdb

JDBC

We will now connect to the blue MonetDB server, from the green server, through JDBC connector.

Installing Java

Let's see if we have Java installed.
java -version
We don't have it, so we can install it like this:
sudo apt install default-jre

Connecting With the JDBC Client

From this link:
https://www.monetdb.org/downloads/Java/  
Download this file "jdbcclient.jre8.jar".

This is JAR file that includes java console client application, but also the driver. It is all in one.Inside of the green server, run this command from the shell:

java -jar /home/sima/Desktop/jdbcclient.jre8.jar -h 192.168.100.146 -p 50000 -u monetdb -d DatabaseB

We are now connected. We can use this java client at the same way as we use "mclient" program.

Using JDBC Connection with DBeaver

From this link:
https://www.monetdb.org/downloads/Java/
Download JDBC driver "monetdb-jdbc-12.0.jre8.jar".

We don't have DBeaver on the green server,
so we have to install it. We will download
DBeaver ".deb" file with wget command.
Then we can install it.
cd /home/sima/Desktop
wget https://dbeaver.io/files/dbeaver-ce_latest_amd64.deb

sudo apt install ./dbeaver-ce_latest_amd64.deb

DBeaver is a GUI program.
Click on
Database > Driver Manager menu.
Click "New".
In the new dialog enter these values in "Settings" tab:

Driver Name:        MonetDB_JDBC
Class Name:        org.monetdb.jdbc.MonetDriver
URL Template:     jdbc:monetdb://{host}:{port}/{database}

This is how "Settings" tab should look:
In the "Libraries" tab click on "Add File". Find "monetdb-jdbc-12.0.jre8.jar" driver file. This will add our driver file into DBeaver.

Close everything
and in the main
window click
on the icon for
the New Connection.
Find our driver
with a search.
Click Next and
in the new dialog
enter these values:  
Host:                192.168.100.146
Port:                50000
Database/Schema:     DatabaseB
Username:            monetdb
Password:            monetdb

This is how dialog will look like. When you click on the "Test" button you will get an error:
Invalid JDBC URL:  jdbc:monetdb://192.168.100.146:50000/DatabaseB  
This error is because DBeaver is trying to validate
"jdbc:monetdb://192.168.100.146:50000/DatabaseB" URL, or it is trying to add some hidden parameters or escape characters.

Solution is to change dialog to manual input.DBeaver will now use the raw URL. After we click on the "Test" button, we will get confirmation dialog.

We can now expand our connection and there we will find all of the schemas and tables, and now we can run queries.

Recursive Common Table Expressions

In the blog post about common table expressions ( link ), I unintentionally lied that MonetDB doesn't support Recursive CTEs. But it does, and I will explain them now.

What is Recursion

When we need to take a shower, it is always difficult to set the right water temperature. To find it, we repeat the same action. We move the lever left and right until we find the perfect spot. With each movement of the lever, we get closer to the desired temperature.

Recursion is an iterative process of finding a solution. We repeat the same logic each time, but each time we are closer to the solution because we can base our logic on the information we have gained during previous iterations.

Let's look at this example. Our friend imagined a number between 1 and 10. We have to guess that number with the minimal number of questions. The best approach is to use binary logic, based on an elimination process where in each iteration we can remove half of the numbers.


Question:
Is it bigger than 5?  

Answer:
Yes, it is.

Question:
Is it bigger than 8?

Answer:
No, it is not.  
Question:
Is it smaller than 7?

Answer:
Yes, it is.  
It has to be 6.The only argument in our alghorithm is the range of the possible numbers.

Our logic has two steps:
1) Does the range of possible numbers has only one number.
2) If it doesn't, ask the question to eliminate half of the numbers and reduce the range of possible numbers by half.

We can pack the second step into function with a name "EliminateHalf". This function will return the range of all the possible numbers. We will call this function 3 times.1) EliminateHalf (1,2,3,4,5,6,7,8,9,10) = (6,7,8,9,10)
2) EliminateHalf (6,7,8,9,10) = (6,7)
3) EliminateHalf (6,7) = (6)

We can nest these functions:
EliminateHalf(EliminateHalf(
   EliminateHalf(
        1,2,3,4,5,6,7,8,9,10))) = 6

Problem is that we don't know how many nested
functions do we need. I will create a pseudo code
that will nest as many functions as needed to get the final result.
Result = EliminateHalf( 1,2,3,4,5,6,7,8,9,10 )  # initial state
Do Until Count( Result ) = 1                    
# are we finished   
    Result = EliminateHalf( Result )            # if not, continue 
Loop

This is RECURSION. We broke the complex problem into small steps. Each step has the same logic. Each step is using arguments that are the result of the previous step. This is just one iterative process which brings us closer to the solution with each step.

You want example from the real life. The coach of the football team analyze data from the previous game. After each analysis he change the game of his team. He continues with this practice until his team start winning.

The Structure of Recursion

Recursion structure always has four steps.
The first step "initial state" is a problem
that we want to solve. We will solve it by
improving our statistics.
1) Set initial values for our arguments.                                                     # initial state                          
2) Has our goal been achieved?                                                                 # are we there yet            # recursive part
3) Improve our arguments by using some strategy.                             # continue with effort     # recursive part
4) Repeat steps 2 and 3 until we reach the goal.                                   # be persistent

Linear and Tree Recursion

In linear recursion we only have two possible outcomes. We are either satisfied with the result or we will continue with our effort. For example, the coach can be satisfied with his team or he can continue introducing improvements.

Tree Recursion is when we have several possible strategies to direct our effort. For example, the coach can change the team's game, or he can look for position in some other team. If we create a diagram of his possible actions we can get something like this:

1) Red path: Be satisfied with a Real, and then transfer to Arsenal, and start improving Arsenal game.
2) Orange path: Be satisfied with a Real, and then transfer to Arsenal, and then to Barcelona, and start improving Barcelona's game.
3) Purple path: Leave Real, improve Liverpool for a while, and then transfer to Juventus to improve Juventus game.

We can present this coach career as folders and subfolders.
Each time he improves his team, he is going deeper in the subfolders ( Real Real+ ).
Each time he changes his job, he is moving to another folder at the same level ( Real+ Juventus ).

This is why this kind of recursion is called Tree Recursion, because we have a hierarchy. This hierarchy is reflection of strategic decisions we have to make to reach our goal. Each decision will move us to another path, but we always use the same logic to decide.

Structure of The Recursive Common Table Expression

WITH RECURSIVE cte_name AS (
    SELECT ...           –initial state

    UNION ALL

   
    SELECT ...           –continue with improvements
    FROM cte_name        –get the previous state
    WHERE ...            –are we there yet )
SELECT * FROM cte_name;  –return result
In the simplest form, recursive CTE has two SELECT statements connected with UNION ALL. First select statement will define initial state.

RECURSIVE CTE will return all interim results connected with UNION ALL.

Second select statement will calculate the new status. It will reference the previous status by the name of the CTE.
WHERE in second select statement will tell us when to stop.

WITH RECURSIVE numbers(n) AS
(    SELECT 1
     
    UNION ALL
    
    SELECT n + 1
    FROM numbers

    WHERE n < 4 )
SELECT * FROM numbers;
The first select statement is called ANCHOR member.
The second one is RECURSIVE member. ANCHOR member
will be executed only once. In this example, RECURSIVE member
will be executed several times, each time with the different
value of "n", until the condition becomes false.
We will apply UNION ALL on all of the interim results.

WITH RECURSIVE factorial(n, fact) AS (
    SELECT 1, 1
     
    UNION ALL
    

    SELECT n + 1, fact * (n + 1)
    FROM factorial
    WHERE n < 5 )
SELECT n, fact FROM factorial;
This is an example how to calculate factorial.


These two examples
are for the linear recursion.

Tree Recursion

This is our hierarchy.
Our hierarchy can be
presented with the
self referencing table.

We'll create such
table in MonetDB.
CREATE TABLE Folders ( Folder TEXT, Parent TEXT );

INSERT INTO Folders ( Folder, Parent ) VALUES
('Folder1', NULL), ('Folder2', NULL),
('Folder11', 'Folder1'), ('Folder12', 'Folder1'),
('Folder21', 'Folder2'), ('Folder211', 'Folder21');

We'll make our CTE:
WITH RECURSIVE Folder_paths AS (
    SELECT Folder, Folder AS Path
    FROM Folders
    WHERE Parent IS NULL

    UNION ALL

    SELECT f.Folder, fp.Path || '>' || f.Folder AS Path
    FROM Folders f  JOIN Folder_paths fp ON f.Parent = fp.Folder
)
SELECT * FROM Folder_paths ORDER BY path;
This is initial state. This table represents folders at the top level.

We'll join original Folders table with this initial state. That will give us our new state.

We'll do it again, between Folders table and new state. This is inner join.

We can now make a final result as a union between (1),(2),(3).

Tree recursion occurs then ANCHOR and RECURSIVE members are select statements that return tables with several rows. Those rows represent folders at the same level. We have (1), (2), (3) for top (1), middle (2) and bottom (3) folders. First, we get top folders (initial state), then middle folders (first recursion), and then bottom folders (second recursion). Each recursion is used to collect folders from the level bellow.

Recursive CTEs Caveats

1) Recursive and anchor member must match in columns number and data types.

2) If we use UNION ( instead UNION ALL ), recursion will stop immediately when we face a duplicate.
This statement  will immediately end, although it doesn't have WHERE statement. This is because the
first recursion will return number 1, which is a duplicate. UNION operator will remove that duplicate,
so our result will have only one row, and that row will have the original number 1.  
WITH RECURSIVE numbers AS (
    SELECT 1
    UNION
    SELECT 1 FROM numbers )
SELECT * FROM numbers;

3) Don't user OUTER JOINS in the recursive member. The query will never end. Only INNER join is acceptable.

4) MonetDB will not complain if we use aggregate or window functions in recursive member. We can also use DISTINCT and GROUP BY in the recursive member. In MonetDB, we can use CTE's name in the FROM clause, but we can also use it in subquery. Some other servers don't allow this.

New Functions

DayName and MonthName Functions

SELECT DAYNAME('2025-07-12');SaturdayThis function returns a name of a day in a week according to the current locale, set in the OS.
SELECT MONTHNAME('2025-07-12');JulyThis is similar function that is returning the name of a month.

Beside date arguments, we can also use timestamp ('1987-09-23 11:40') or timestamp TZ ('15:35:02.002345+01:00').

Generate Series Functions

SELECT * FROM generate_series(1,9,2);1,3,5,7This function will return numbers from 1 do 9, with step 2. Default step is 1.

SELECT * FROM
generate_series('2025-01-01','2025-01-10',INTERVAL '5' DAY);
2025-01-01,2025-01-06All dates from the range, but with the step of 5 days.
SELECT * FROM
generate_series('2025-01-01','2025-05-10',INTERVAL '2' MONTH);
2025-01-01, 2025-03-01We can also get the months with the step od 2 months.

We can list seconds or days between two timestamps. 
SELECT * FROM generate_series('2025-01-01 01:40:00','2025-01-01 1:40:05', INTERVAL '3' SECOND);2025-01-01 01:40:00
2025-01-01 01:40:03
SELECT * FROM generate_series('2025-01-01 01:40:00','2025-01-06 1:40:05', INTERVAL '3' DAY);2025-01-01
2025-01-04

Generate Series functions will return a column with the name "value".
SELECT * FROM generate_series( 1,7,2);
SELECT * FROM generate_series('2025-01-01','2025-01-10',INTERVAL '5' DAY);

0500 Proto_loaders, ODBC and COPY in MonetDB

We will continue using the green and blue databases that we have created in this post => link ( or you can watch the video on the youtube  => link ). This was the post about distributed query processing.monetdbd start /home/sima/monetdb/DBfarmG
mclient -u monetdb -d DatabaseG
   –password monetdb

monetdbd start /home/sima/monetdb/DBfarmB
mclient -u monetdb -d DatabaseB
   –password monetdb

Read From CSV File With a file_loader Function

We have three files and two ".gz" archives.

  • "CSV" file is using commas. It also has commas at end of the rows with data. This is the only file that doesn't have file format extension.
  • "TSV.tsv" file is using tab as delimiter, but it also has double quotes around the strings.
  • "PSV.psv" file is using pipes and has a null in the "Letter" column.
  • Files with file format extension ".gz"are just PSV file compressed.
 

Files must be placed on the server. Returned value of the "file_loader" function is virtual table.We don't specify delimiters, wrappers and data types for the files. They are deduced automatically.We can read from CSV, TSV and PSV files, and also ".gz,.lz4,.bz2 and .xz" files.

file_loader Function Syntax

SELECT * FROM file_loader( '/home/abhishek/sample.csv' );The only argument of our function is the full path toward the file.

SELECT * FROM '/home/abhishek/sample.csv';Shorter syntax is much better. We don't have to type the function name.

Experiments With the CSV File

We can not read from files that don't have file format extension.
SELECT * FROM '/home/sima/Desktop/CSV';

I will add file format extension and then I will try again.

SELECT * FROM '/home/sima/Desktop/CSV.csv';
Our CSV file is now reduced to only one column.
The whole row of data from the CSV file is imported into one column.
This is the consequence of the commas at the end of the row.

I will remove the surplus commas. Now our query works as expected.

Experiments With the TSV File

Import from the TSV file will work fine. The double quotes wrappers around strings will be considered as a part of a string.
SELECT * FROM '/home/sima/Desktop/TSV.tsv';

We can try to trim double quotes, but our column will not be recognized.
SELECT TRIM( Letter, '"' ) FROM '/home/sima/Desktop/TSV.tsv';

Column names are case sensitive, so we have to place column names inside of the double quotes. Only then our query will work.
SELECT TRIM( "Letter", '"' ) FROM '/home/sima/Desktop/TSV.tsv';

Experiments With the PSV File

In PSV file we have a null. This null will be imported, but we don't know whether it is a null or a string "null".
SELECT * FROM '/home/sima/Desktop/PSV.psv';

We will test the null value. We can see that "null" is a string.
SELECT ISNULL( "Letter" ) FROM '/home/sima/Desktop/PSV.psv';

Experiments With the GZ files

We can easily read from the compressed PSV.psv file.
SELECT * FROM '/home/sima/Desktop/PSV.psv.gz';

If we compress our file as a "tar" tape archive format then file loader will not work.
SELECT * FROM '/home/sima/Desktop/PSV.psv.tar.gz';

Conclusion

We can conclude that file_loader function is not as versatile as a COPY INTO function, which is described in this blog post => link.

Read From Remote Database With a proto_loader Function

We have saw that we can login to MonetDB server that is on another computer. We also saw how we can create remote tables. This time we will see how to ad-hoc read tables that are on some other computer/server.

Testing Local MonetDB Server

This syntax will work on the local MonetDB server, too. We can use localhost to read from the local computer. I can run this statement on the green server. We can recall that local database, schema and table names are "DatabaseG", "schemagb" and "dimg".
SELECT ProdID FROM proto_loader( 'monetdb://127.0.0.1:50000/DatabaseG/schemagb/dimg' );

This IP address and port number are default. We can use the link without them.
Notice that the database name must use upper letters, but schema and table name must be with lower letters.
SELECT * FROM proto_loader( 'monetdb:///DatabaseG/schemagb/dimg' );

We can also use the the link that has "mapi:" in front of it. In this case, we have to provide IP and port number.

SELECT * FROM proto_loader( 'mapi:monetdb://192.168.100.145:50000/DatabaseG/schemagb/dimg' );

Testing Remote MonetDB Server

More interesting thing is ability to read tables from the remote server. I will read table from the blue server ( before that please start the blue server ).

This statement will be executed on the green server and will return the whole table from the blue server.
SELECT * FROM proto_loader( 'monetdb://192.168.100.146:50000/DatabaseB/schemagb/factb' );

Syntax with "mapi:" will also work on the remote server.

SELECT * FROM proto_loader( 'mapi:monetdb://192.168.100.146:50000/DatabaseB/schemagb/dimb' );

Creation Of a Remote Table

We can test whether we can create remote table using syntax that starts with "monetdb://". On the blue server I will change current schema and then I will create one table.

monetdbd start /home/sima/monetdb/DBfarmB
mclient -u monetdb -d DatabaseB
 –password monetdb
SET SCHEMA SchemaGB;
CREATE TABLE Test( Number INT );

On the green server, I will set current schema to SchemaGB, and I will try to connect to the Test table.
SET SCHEMA SchemaGB;
CREATE REMOTE TABLE Test( Number INT ) ON 'monetdb://192.168.100.146:50000/DatabaseB';

This will successfully create a remote table.
SELECT * FROM Test;

Connect to Any ODBC Database From MonetDB

ODBC Driver Manager

An ODBC driver manager is a middleware component that acts as an intermediary between applications and ODBC drivers, translating standardized ODBC API calls into driver-specific instructions. It allows applications to connect to different databases without needing to be rewritten for each one.

On the green server, we will install ODBC Driver Manager. First check if you alredy have it installed. Just type "odbcinst" in the shell.

I don't have it, so I will install "unixodbc" driver manager. After that command "odbcinst" will show us that the program is working.
sudo apt install unixodbc

This program will add file "odbc.ini" in the "etc" folder. This file is empty.

/etc/odbc.ini

MonetDB ODBC Driver

We can connect to any ODBC capable server, but we will use this opportunity to see how to connect to MonetDB server. We will use MonetDB ODBC driver to connect to the blue server.This is ODBC driver we need. We install it on the green server.

sudo apt install libmonetdb-client-odbc

This driver will add
"odbcinst.ini" file
in the "/etc" folder.

/etc/odbcinst.ini
The driver itself will be in this folder:

/usr/lib/x86_64-linux-gnu/libMonetODBC.so

We can see a problem. Our file is in "x86_64-linux-gnu" folder. Our "odbcinst.ini" file is directed toward "/lib" folder. We must fix that.  In the shell we run this command to start text editor as sudo, and we fix our link.
sudo gedit /etc/odbcinst.ini

Testing ODBC Driver

First, we will add our credentials to "/etc/odbc.ini" file.

sudo nano /etc/odbc.ini

Don't use "gedit" graphical text editor program. Use nano. Only then it will work.
[DatabaseB]
Description = Connect to Blue Server
Driver = MonetDB Server = 192.168.100.146
Port = 50000
Database = DatabaseB
User = monetdb
Password = monetdb

"isql" is console program that comes with "unixODBC". We can use it to test ODBC connection toward the blue server. We can connect to the blue server if we type this into green server shell:

isql -v DatabaseB

After connection, we can set our schema, and we can query our table:

SET SCHEMA SchemaGB;
SELECT * FROM factb;

proto_loader Function For ODBC

But our goal is to use "proto_loader" function to directly fetch data into MonetDB server on the green computer, from the blue server, with ODBC. For that we will install one more package.sudo apt install monetdb-odbc-loader

Using ODBC loader is still experimental. This functionality is NOT turned on by default. We will now turn it on. First, we exit "DatabaseG".quit
monetdb stop DatabaseG

monetdb get all DatabaseG

We can now read properties of
the DatabaseG. Property of
interest is the last one:

loadmodules
We will change this property like this:
monetdb set loadmodules=odbc_loader DatabaseG
We can always turn it back like this:
monetdb set loadmodules=  DatabaseG

We will now log in to our database. This will automatically start the server. During that, we will automatically load "odbc-loader" module.mclient -u monetdb -d DatabaseG    –password monetdb

Using proto_loader Function For ODBC

This is how we run ODBC query directly from the mclient. In this way we can connect to any ODBC capable server, not just other MonetDB. We just have to prepare ODBC driver for that other server.
SELECT * FROM
proto_loader('odbc:DSN=DatabaseB;QUERY=SELECT * FROM schemagb.factb');

All the magic is on the green server. "proto_loader" function will use "odbc_loader" (1) module to send standardized ODBC message to "unixODBC manager" (2). This manager will use MonetDB ODBC driver (3) to call the Blue server (4).

It is also possible to provide all of the necessary parameters directly inside of the ODBC connection string:
SELECT * FROM proto_loader('odbc:DRIVER=/usr/lib/x86_64-linux-gnu/libMonetODBC.so;SERVER=192.168.100.146;PORT=50000;DATABASE=DatabaseB;UID=monetdb;PWD=monetdb;QUERY=SELECT * FROM schemagb.factb')
There is also a version that is using DSN file. This version is for Windows only.
odbc:FILEDSN=<data source name>;[<ODBC connection parameters>;]QUERY=<SQL query>

Virtual Tables

Virtual tables are tables that don't have data physically stored in MonetDB table. Virtual tables are views, merge tables, remote tables. Tables that we receive through file_loader and proto_loader functions are also virtual tables. We will now see how to transform file_loader and proto_loader virtual tables into more permanent structures.

CREATE TABLE Based on the Loader Function

We can use CREATE TABLE AS to store CSV file into new table:
CREATE TABLE permanentCSV ( Number, Letter ) AS ( SELECT * FROM '/home/sima/Desktop/CSV.csv' );

Next, we will read from the permanentCSV table:
SELECT * FROM permanentCSV;

CREATE TEMPORARY TABLE Based on the Loader Function

CREATE LOCAL TEMPORARY TABLE temporaryFactB ( YearNum, Dates, ProdID, Qty ) AS
(  SELECT * FROM proto_loader('odbc:DSN=DatabaseB;QUERY=SELECT * FROM schemagb.factb') )
WITH DATA ON COMMIT PRESERVE ROWS;
We can also make a temporary table.

We can read from this temporary table.

SELECT * FROM temporaryFactB;

Bulk INSERT Based on the Loader Function

TRUNCATE temporaryFactB;                     We can pull data from any other ODBC capable server into our temporaryFactB table (which is now empty).
INSERT INTO temporaryFactB ( YearNum, Dates, ProdID, Qty )
SELECT * FROM proto_loader('odbc:DRIVER=/usr/lib/x86_64-linux-gnu/libMonetODBC.so;SERVER=192.168.100.146;PORT=50000;DATABASE=DatabaseB;UID=monetdb;PWD=monetdb;QUERY=SELECT * FROM schemagb.factb');

SELECT * FROM temporaryFactB; 

COPY command

We already talked about COPY INTO and COPY FROM statements ( blog1 and blog2; youtube1 and youtube2 ). We will now see some special syntaxes of these commands.

COPY FROM stdin

We will first create one empty table.CREATE TABLE tabStdin( Number INT, Letter CHAR );

sql>COPY INTO tabStdin FROM STDIN;
more>1|a
more>2|b
more>

2 affected rows
sql>
We will then run this statement that copies directly from the mclient. After this statement, the prompt will become "more>". That means that MonetDB is expecting more data. We will type the first row "1|a", then the second row "2|b", and for the third row we will enter nothing. After we press Enter,  we will exit and prompt will become "sql>".

We can now read from our table. Values that we have typed after "more>" prompt are inside of the table. Pipe was used as a delimiter between columns.
SELECT * FROM tabStdin;

sql>COPY 2 RECORDS INTO tabStdin FROM STDIN; more>3|c
more>4|d
2 affected rows
sql>
If we provide the number of
rows
that we will COPY,
then we just have to type Enter twice.

COPY INTO stdout

Statement below doesn't work correctly:
COPY SELECT * FROM tabStdin INTO STDOUT;
According to github, this is a bug:
https://github.com/MonetDB/MonetDB/issues/7028

COPY FROM Csv, With DECIMAL Clause

When importing CSV file, we can specify what
symbols are used for decimal and thousands
separators. Let's say that we have CSV file where
decimal point is star "*", and thousands separator
is underscore "_".
I will create one table that accepts DECIMAL numbers.
CREATE TABLE tabDecimal
( Number DECIMAL, Letter CHAR );

COPY OFFSET 2 INTO tabDecimal
FROM '/home/sima/Desktop/CSV_file'( Number, Letter ) DECIMAL AS '*','_';
With DECIMAL clause we can specify what decimal point and thousands separator, our CSV has.

If we now read from our table, there we will find numbers correctly recognized.In MonetDB, default decimal point is a dot. There is no default sign for thousands separator.

0490 Grouping Sets and Comments in MonetDB

Sample Table

CREATE TABLE tabSales( Continent VARCHAR(20), Subcontinent  VARCHAR(20), Country VARCHAR(20),
                       State     VARCHAR(30), Sales   INT          );

INSERT INTO tabSales VALUES ('America', 'North',   'Canada', 'Ontario',     1),
                            ('America', 'North',   'Canada', 'Quebec',      2),
                            ('America', 'North',   'US',     'California',  4),
                            ('America', 'North',   'US',     'Texas',       8),
                            ('America', 'Central', 'Mexico', 'Jalisco',     16);

SELECT * FROM tabSales;

The Problem

When we create a Pivot table, from the sample table, we will see all of the detail sales (1,2,4,8,16), but we will also see totals (3,12,16,31).

The question is, what query would return all of these numbers, both detail values and totals, if we use MonetDB.

This is one possible solution:

SELECT Country, State, Sales FROM tabSales
UNION ALL
SELECT Country, null, SUM( Sales ) FROM tabSales GROUP BY Country
UNION ALL
SELECT null, null, SUM( Sales ) FROM tabSales;
On the image, the rows are sorted so
that the table looks like the pivot table.

UNION ALL solution is bad for several reasons:
1) We have three queries to execute and then to combine multiple result sets into one.
2) It is hard to read and modify long UNION ALL query.
3) We have to be careful to properly align columns.

This is the problem that can be solved by grouping sets.

Grouping Sets

"Grouping Sets" are much better and faster syntax to achieve the same goal.  

SELECT Country, State, SUM( Sales )
FROM tabSales
GROUP BY GROUPING SETS ( Country, State, ( ) );  


Empty parentheses are for the grand total.
We'll get the same result, except the
repetition of country names is reduced. Instead of them we have nulls.

Look what we will get if we place parentheses around Country and State.  

SELECT Country, State, SUM( Sales )
FROM tabSales
GROUP BY GROUPING SETS ( ( Country, State ), () );


Parentheses are there to define each group.

We can see the effect of parentheses better on this example.  

SELECT Continent, Subcontinent, SUM( Sales )
FROM tabSales
GROUP BY GROUPING SETS ( ( Continent, Subcontinent ), Continent );

Continent can be used by itself, but it can be also used in conjunction with Subcontinent to define a group. 

It is now clear that each element inside GROUPING SETS is a separate definition of a group. Each group can be defined by one column  > Continent <, or by several columns placed inside of the parentheses ( Continent, Subcontinent ).

These two examples, that would return the same result, show the logic and brevity of the grouping sets.SELECT Col1, Col2, SUM( Sales )
FROM Table
GROUP BY GROUPING SETS
    (  ( Col1, Col2 ), Col1 );
SELECT Col1, Col2, Sales FROM Table
UNION ALL

SELECT Col1, null, SUM( Sales )
   FROM Table
   GROUP BY Col1;

Rollup

SELECT Continent, Subcontinent, Country, SUM( Sales )
FROM tabSales

GROUP BY ROLLUP( Continent, Subcontinent, Country );
ROLLUP( a, b, c ) is the same as grouping sets "( a, b, c ), ( a, b ), ( a ), ()". This is a way to get hierarchy of the columns. Rollup will give us all of the numbers that we need to create a pivot table.
For ROLLUP, the order of the columns is important.

ROLLUP( a, b, c )                ROLLUP( c, b, a )

( a, b, c )                      ( c, b, a )
( a, b )                         ( c, b )
( a )                            ( c )
( )                              ( )

Similar to GROUPING SETS, ROLLUP can also create combinations of columns by using parentheses.
 
SELECT Subcontinent, Country, State, SUM( Sales )
FROM tabSales
GROUP BY ROLLUP( ( Subcontinent, Country ), State );

CUBE

CUBE works similar to ROLLUP, but have a different logic. CUBE will give us all of the possible combinations. CUBE( a, b, c ) will give us 2^3 grouping sets "(a,b,c), (a,b), (a,c), (b,c), (a), (b), (c) and ()".
———————————————————————————-

SELECT Subcontinent, Country, SUM( Sales )
FROM tabSales GROUP BY CUBE( Subcontinent, Country );
Because we have only 2 columns inside of CUBE in our example, number of combinations is 2^2 = 4 "(a,b), (b), (a), ()".

We can also define groups by using parentheses.  
SELECT Subcontinent, Country, SUM( Sales ) FROM tabSales
GROUP BY CUBE( ( Subcontinent, Country ) );

We now have only one element. We have only  two ( 2^1 ) groups "(a), ()".

Addition and Multiplication in Grouping Sets

This is addition:

( a, b )       +       ( c )       =       ( a, b )
                                           ( c )
This is multiplication. Multiplication is crossjoin between individual values.
                                      a1bc1
( a )            ( b, c )             a1bc2
  a1      *        bc1       =        a1bc3
  a2               bc2                a2bc1
                   bc3                a2bc2
                                      a2bc3

Syntax for addition is like this. Everything inside of the GROUPING SETS parentheses will be added to each other. In this example we will add ( Subcontinent ) + ( Country ) + ( ).  

SELECT Subcontinent, Country, SUM( Sales )
FROM tabSales
GROUP BY GROUPING SETS ( Subcontinent, ROLLUP( Country ) );

So, if we create GROUPING SETS like this, this will be addition.
GROUPING SETS ( Continent, ROLLUP( Continent, Subcontinent ), CUBE( Country, State ), () )

Addition can easily create duplicates:
SELECT Continent, SUM( Sales ) FROM tabSales
GROUP BY GROUPING SETS ( CUBE( Continent ), () );
CUBE will create Grand Total, but we will also
get grand total from the "( )" element.

This is a syntax for multiplication. This time we will have commas between GROUPING SETS, ROLLUPS and CUBES, and individual elements.
GROUPING SETS ( Continent ), ROLLUP( Continent, Subcontinent ), CUBE( Country, State ), (),Country

This example will give us 2 x 2 = 4 rows. ROLLUP will give us America, and "( )". GROUPING SETS will give us "North" and "Central". Then we combine them 2 x 2.  

SELECT Continent, Subcontinent, SUM( Sales ) FROM tabSales
GROUP BY ROLLUP( Continent ), GROUPING SETS ( Subcontinent );
Multiplication can also easily create duplicates:
SELECT Continent, SUM( Sales ) FROM tabSales
GROUP BY GROUPING SETS ( Continent, () ), GROUPING SETS ( Continent, () );

Indicator Function – GROUPING

GROUPING function will inform us what rows are subtotals / grand total. In such rows, some columns have nulls because they are consolidated. GROUPING function has an argument which is a column, and GROUPING function will return the result only for that column.

SELECT Continent, GROUPING( Continent ) AS ContiGroup, Subcontinent, GROUPING( Subcontinent ) AS SubcontiGroup , SUM( Sales )
FROM tabSales
GROUP BY ROLLUP ( Continent, Subcontinent );

This function is important because it help us to make distinction between subtotal nulls, and missing data nulls.

Formatting with COALESCE and Sort

This is not good looking table. Let's fix it.

SELECT Subcontinent, State, SUM( Sales )
FROM tabSales
GROUP BY ROLLUP( Subcontinent, State );

COALESCE will helps us to eliminate NULLS:

SELECT COALESCE( Subcontinent, 'Grand' ) AS Subcontinent
     , COALESCE( State, 'Total' ) AS State
     , SUM( Sales ) AS Sales FROM tabSales
GROUP BY ROLLUP( Subcontinent, State );

With GROUPING function, we can create columns that will help us to sort the table.

SELECT COALESCE( Subcontinent, 'Grand' ) AS Subcontinent
     , COALESCE( State, 'Total' ) AS State
     , SUM( Sales ) AS Sales
     , GROUPING( Subcontinent ) AS SubcSort
     , GROUPING( State ) AS StateSort 
FROM tabSales GROUP BY ROLLUP( Subcontinent, State )
ORDER BY SubcSort, Subcontinent, StateSort;

These auxiliary columns ( SubcSort and StateSort ) can be easily eliminated by wrapping everything with "SELECT Subcontinent, State, Sales".

Comments

Sample Table and Function

Let's create two tables and function.

CREATE TABLE tabComment( Number INTEGER );
CREATE TEMPORARY TABLE tabTemporary( Number INTEGER );

CREATE OR REPLACE FUNCTION funcComment( Arg1 INTEGER )
RETURNS INTEGER
BEGIN
     RETURN 2;
END;

Comments on Database Objects

We can create comments that are tied for database objects. Comments convey information about that object.  COMMENT ON TABLE tabComment IS 'tabComment description';
COMMENT ON COLUMN tabComment.Number IS 'Number column description';
COMMENT ON FUNCTION funcComment IS 'funcComment description';
COMMENT ON SCHEMA sys IS 'sys schema description';

We will then find IDs of our database objects:
SELECT * FROM sys.tables WHERE name = 'tabcomment';
15876
SELECT * FROM sys.columns WHERE table_id = 15876;
15875
SELECT * FROM sys.functions WHERE name = 'funccomment';
15881
SELECT * FROM sys.schemas WHERE name = 'sys';

2000
All of these IDs can be found in the system table "sys.comments" together with their comments.
SELECT * FROM sys.comments
WHERE Id IN ( 15876, 15875, 15881, 2000 );


Deleting a Comment

If we delete an object, its comment will be deleted.
DROP TABLE tabComment;
SELECT * FROM sys.comments WHERE Id = 15876;

We can delete a comment by setting it to NULL or an empty string.
COMMENT ON SCHEMA sys IS null;
SELECT * FROM sys.comments WHERE Id = 2000;

If a function is overloaded then we have to provide the full signature.
COMMENT ON FUNCTION funcComment( INTEGER ) IS '';
SELECT * FROM sys.comments WHERE Id = 15881;

Persistent Database Objects

There are other database objects that we can place a comment on. They are all persistent database objects.COMMENT ON VIEW view_name IS 'Comment';
COMMENT ON INDEX index_name IS 'Comment';
COMMENT ON SEQUENCE sequence_name IS 'Comment';
COMMENT ON PROCEDURE procedure_name IS 'Comment';

COMMENT ON AGGREGATE aggregate_name IS 'Comment';
COMMENT ON LOADER loader_name IS 'Comment';

We can not create a comment on a temporary object.
COMMENT ON TABLE tabTemporary IS 'tabTemporary description';

0480 Distributed Queries in MonetDB

Note: Before reading this blog post, you should read article about merge tables in the MonetDB ( article ), or you can watch on youtube ( video ).
Advice: I suggest you follow this blog as a strict instruction. Any freestyling could mean that you will have problems to create remote tables.

Visual Presentation

In this case, we will use two MonetDB instances that are placed on two different computers. Both of them will have dimension table with products, and that table will have the same data on both computers (DimG and DimB on the image). I will use (G)reen and (B)lue colors to differentiate servers.

Fact table will be divided into two parts. FactG will be on the green server, and FactB will be on the blue server. They will have different data.

The next step is to create remote tables. Remote tables are references on one server to tables on another server. The purpose of remote tables is to let each server know about all of the tables in the system and how those tables are connected. This allows each server to create a query execution plan, even though that query uses tables from both servers.

Green serverBlue server
Table: DimG
Remote table: DimB

Table: FactG
Remote table: FactB
Table: DimB
Remote table: DimG

Table: FactB
Remote table: FactG

As a user, we can now connect to one server (it doesn't matter which one) and we can execute a query that will use tables from both servers.

The computer we connect to will create an execution plan for the query. This plan will usually assume that intermediate results (1,2) will be calculated on each server. That way we can divide processing between the servers and increase parallelism.

The intermediate results will then be collected by the master computer (the one that creates the execution plan) and transformed into the final result (3).

The whole purpose of distributed queries is to divide the work between the computers in the cluster and increase performance.

Green Server Setup

mkdir /home/sima/monetdb/DBfarmG            –create monetdb folder
monetdbd create /home/sima/monetdb/DBfarmG  
–initialize monetdb folder
monetdbd start /home/sima/monetdb/DBfarmG   –start monetdbd deamon
monetdb create DatabaseG                  
–create database "DatabaseG"
monetdb release DatabaseG                 
–make "DatabaseG" available
mclient -u monetdb -d DatabaseG  
–log as administrator to "DatabaseG" (pass="monetdb")
In this part, we will create MonetDB folder,
one database, and we will login as administrator.

As admin we will create new role, schema and user. Schema will have that role as authorization. User will have that schema as default schema, and role as default role. That means that user will be able to use all of the objects in this schema.

CREATE ROLE RoleGB;
CREATE SCHEMA SchemaGB AUTHORIZATION RoleGB;
CREATE USER UserGB WITH PASSWORD 'gb' NAME 'Distributed User' SCHEMA SchemaGB DEFAULT ROLE RoleGB;
quit
Letters "GB" mean that this role, schema and user will exist in both the green and the blue server. In order for one user to access tables from both servers, his account must be present in both servers.

mclient -u usergb -d DatabaseG       –be careful about upper and lower letters
CREATE TABLE FactG ( YearNum INT, Dates DATE, Prodid INT, Qty INT );
INSERT INTO FactG VALUES ( 2025, '2025-01-01', 11, 5), ( 2025, '2025-01-02', 11, 10)
  
                                       , ( 2025, '2025-01-03', 22, 15), ( 2025, '2025-01-04', 22, 20);
CREATE TABLE DimG( ProdID INT, ProdName VARCHAR(50) );
INSERT INTO DimG VALUES (11, 'product11'), (22, 'product22'), (33, 'product33');
quit
Then, we will login as a new user "usergb".
We will create two tables, "DimG" and "FactG",
and we will fill them with the data.

At the start of this blog post, we already saw
content of these tables.

monetdbd stop /home/sima/monetdb/DBfarmG
monetdbd get all /home/sima/monetdb/DBfarmG
monetdbd set listenaddr=0.0.0.0 /home/sima/monetdb/DBfarmG


We are back in the shell. I will stop monetdb folder because I want to change one setting. That setting is "listenaddr". This setting is currently "localhost" which means that server is only available from the local computer.

We will change that value to "0.0.0.0" which means that anyone can access MonetDB server.

We will then start our MonetDB folder again:
monetdbd start /home/sima/monetdb/DBfarmG


We can test who is listening the port 50.000.
ss -tulnp | grep 50000           

Below you can see some commands that you can use to configure your firewall so that you can control communication between servers. I will not teach you how to manage the firewall. I will just show you how I configured it. If you have any problems with the firewall, you can reset all the rules with "sudo ufw reset", and then you can disable the firewall with "sudo ufw disable".

I will now set firewall by running these
commands in the shell.

"ufw" means "uncomplicated firewall".


sudo ufw enable     –the firewall will be permanently enabled. It is disabled by default.
sudo ufw default deny incoming       –no one can call us
sudo ufw default allow outgoing     
–we can call anyone
— bellow is command that will open incoming traffic on port 50.000 for tcp protocol.
— it will also limit the range of IP addresses only to IP addresses on my local network.

sudo ufw allow from 192.168.100.0/24 to any port 50000 proto tcp  
— I don't know whether your local network is using the same range of IP addresses. I will delete this rule.
sudo ufw delete allow from 192.168.100.0/24 to any port 50000 proto tcp
–I will create a simpler rule that will only limit the port number and tcp protocol.
sudo ufw allow 50000/tcp

sudo ufw status          –we can check the status of the firewall.

This step is just for the people who are following this tutorial with linux virtual machines inside of the Virtual Box. Go to Settings > Network > Adapter 1, and choose the "Bridged Adapter" option. This will include virtual machines into the local network. This step is needed so that two virtual machines can communicate over the network.

Testing Remote Access to the Green Server

In the green server shell, run this command. This is how you can find IP address of a server.
ip addr | grep inet

Now you can go to the blue server, and from there you can run this code.
mclient -h 192.168.100.145 -p 50000 -u usergb -d DatabaseG
— password is "gb"
This is how we can log to remote green server, over the network.

Blue Server Setup

There is nothing different in preparing of the blue server. I will just repeat the same commands, but with different identifiers.

mkdir /home/sima/monetdb/                          –create monetdb folder
monetdbd create /home/sima/monetdb/DBfarmB         –initialize monetdb folder
monetdbd start /home/sima/monetdb/DBfarmB         
–start monetdbd deamon
monetdb create DatabaseB                          
–create database "DatabaseB"
monetdb release DatabaseB                         
–make "DatabaseB" available
mclient -u monetdb -d DatabaseB                   
–log as administrator to "DatabaseB"
CREATE ROLE RoleGB;           –all identifiers are the same for the privileges, as for the green server
CREATE SCHEMA SchemaGB AUTHORIZATION RoleGB;
CREATE USER UserGB WITH PASSWORD 'gb' NAME 'Distributed User' SCHEMA SchemaGB DEFAULT ROLE RoleGB
;
quit;
mclient -u usergb -d DatabaseB          –now we login as the user
CREATE TABLE FactB ( YearNum INT, Dates DATE, Prodid INT, Qty INT );
INSERT INTO FactB VALUES (2026, '2026-01-01', 11, 105), (2026, '2026-01-02', 11, 110)                       
                                              , (2026, '2026-01-03', 33, 115), (2026, '2026-01-04', 33, 120);
CREATE TABLE DimB (  ProdID INT, ProdName VARCHAR(50) );
INSERT INTO DimB VALUES (11, 'product11'), (22, 'product22'), (33, 'product33');
quit
monetdbd stop /home/sima/monetdb/DBfarmB        
monetdbd get all /home/sima/monetdb/DBfarmB
monetdbd set listenaddr=0.0.0.0 /home/sima/monetdb/DBfarmB 
–we make the server available from the network
monetdbd start /home/sima/monetdb/DBfarmB

ss -tulnp | grep 50000         
sudo ufw enable                    –if needed, we can set firewall
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow 50000/tcp
sudo ufw status

We can login to blue server from the green server (password="gb").
mclient -h 192.168.100.146 -p 50000 -u usergb -d DatabaseB

Preparing REMOTE, REPLICA and MERGE Tables in the Green Server

In the green server, we will create REMOTE tables that are just references toward tables in the blue server.
mclient -u monetdb -d DatabaseG     –only admin can create REMOTE tables (pass="monetdb")
SET SCHEMA SchemaGB;                –we must be in the same schema as the physical tables in the blue server  
CREATE REMOTE TABLE DimB( ProdID INT, ProdName VARCHAR(50) ) on 'mapi:monetdb://192.168.100.146:50000/DatabaseB';  
 –ip address of blue server  
CREATE REMOTE TABLE FactB( YearNum INT, Dates DATE, ProdID INT, Qty INT ) on 'mapi:monetdb://192.168.100.146:50000/DatabaseB';
        

CREATE REPLICA TABLE Dim( prodid INT, prodname VARCHAR(50) );
ALTER TABLE Dim ADD TABLE DimG;
ALTER TABLE Dim ADD TABLE DimB;
Tables "DimG" and "DimB" are totally identical. We must notify MonetDB that they are "replicas". For queries MonetDB can use any of these tables interchangeably.

CREATE MERGE TABLE Fact( YearNum INT, Dates DATE, ProdID INT, Qty INT );
ALTER TABLE Fact ADD TABLE FactG;
ALTER TABLE Fact ADD TABLE FactB;
Tables "FactG" and "FactB" are just partitions of the merge table. This is why you should read/watch video about merge tables.

We can now query these two new tables Dim and Fact.Dim table is unchanged, but Fact table is UNION of FactG and FactB.
SELECT * FROM Dim;
SELECT * FROM Fact;

Preparing REMOTE, REPLICA and MERGE Tables in the Blue Server

I will again repeat all of the steps, but this time for the blue server.

mclient -u monetdb -d DatabaseB     –only admin can create REMOTE tables. (pass="monetdb")
SET SCHEMA SchemaGB;                –we must be in the same schema as the physical tables in the blue server
CREATE REMOTE TABLE DimG( ProdID INT, ProdName VARCHAR(50) ) on 'mapi:monetdb://192.168.100.145:50000/DatabaseG';  
CREATE REMOTE TABLE FactG( YearNum INT, Dates DATE, ProdID INT, Qty INT ) on 'mapi:monetdb://192.168.100.145:50000/DatabaseG';

CREATE REPLICA TABLE Dim( ProdID INT, ProdName VARCHAR(50) );
ALTER TABLE Dim ADD TABLE DimG;
ALTER TABLE Dim ADD TABLE DimB;
For performance reasons, MonetDB does not check if the replica tables are indeed identical. This responsibility is left to the database users.

CREATE MERGE TABLE Fact( YearNum INT, Dates DATE, ProdID INT, Qty INT );
ALTER TABLE Fact ADD TABLE FactG;
ALTER TABLE Fact ADD TABLE FactB;
 

SELECT * FROM Dim;
SELECT * FROM Fact;


Horizontally dividing one table between computers in a cluster is called "sharding".

System Tables

In "sys.tables" we can find all our tables. Based on the system table "sys.table_types" we can see that we have regular (0), remote (5), merge (3) and replica tables (6).
SELECT * FROM sys.tables WHERE name IN ( 'factg', 'factb', 'fact', 'dimg', 'dimb', 'dim' );
SELECT * FROM sys.table_types;

Dependencies inside of "fact" and "dim" table can be seen in the system view "sys.dependencies_vw".
SELECT * FROM sys.dependencies_vw WHERE depend_type = 2;

The Fruit of Our Labor

In the green server, I will now login as a "UserGB". This user has privileges over all of the objects in the "SchemaGB". He can query "Dim" and "Fact" tables.
mclient -u usergb -d DatabaseG   –password "gb"  
SELECT ProdName, SUM( Qty ) AS Qty
FROM Dim INNER JOIN Fact
    ON Dim.ProdID = Fact.ProdID

GROUP BY ProdName;
Finally, we can run a query that will use the power of the two computers and two MonetDB servers.

It is the same for the blue server.
mclient -u usergb -d DatabaseB   –password "gb"  
SELECT ProdName, SUM( Qty ) AS Qty
FROM Dim INNER JOIN Fact
    ON Dim.ProdID = Fact.ProdID

GROUP BY ProdName;
We finally have a working distributed query processing system.

Load Balancer

The question is how to divide users between two (or more) servers to equalize the load. The easiest way is to have all users born from January to June use the green server, and all users born from July to December use the blue server.

A more professional way is to use a load balancer, something like HAproxy (link). This program will direct users to a server that is currently not under load.

0470 Users and Privileges in MonetDB part2

We will start with a blank state. I will delete the users that were created in the Part 1.DROP USER newUser;DROP USER mnm;

Initial State

mclient -u monetdb -d voc #password "monetdb"
I will create three users. I won't define their default schemes, so each of them will get a scheme named after them. We also have admin user "monetdb".

CREATE USER a_user WITH PASSWORD '1' NAME 'a';
CREATE USER b_user WITH PASSWORD '1' NAME 'b';
CREATE USER c_user WITH PASSWORD '1' NAME 'c';

I will now login as "a_user" and I will create a table and a function.
They will be created in the schema "a_user".
mclient -u a_user -d voc    #password "1"
CREATE TABLE a_user.a_table ( a_number INT PRIMARY KEY, a_letter CHAR );
INSERT INTO a_user.a_table ( a_number, a_letter ) VALUES ( 1, 'a' );
CREATE OR REPLACE FUNCTION a_user.a_Fx()
RETURNS INTEGER
BEGIN
     RETURN 2;
END;

I will create one table in the "b_user" schema.
mclient -u b_user -d voc    #password "1"
CREATE TABLE b_user.b_table ( b_number INT, b_letter CHAR );
INSERT INTO b_user.b_table ( b_number, b_letter ) VALUES ( 1, 'b' );

Privileges vs Roles

Privilege is a permission to do one action (SELECT, UPDATE…). Role is a collection of such privileges.

Grant Privileges to a User

Grant and Revoke all Privileges on a Table

"b_user" can not use the table "a_table".

"a_user" will grant privilege to "b_user" to use table "a_table". 
"FROM CURRENT_USER" is the default and can be omitted. That means that the privilege is given by the CURRENT_USER. Other option is that the privilege is given by a role. We'll see that later.
mclient -u a_user -d voc    #password "1"
GRANT ALL ON a_user.a_table TO b_user FROM CURRENT_USER;

We will now log in as a b_user and we will try to use "a_table".
mclient -u b_user -d voc    #password "1"
SELECT * FROM a_user.a_table;

"b_user" can not alter the table. Altering of a table is reserved for an owner of a table. "b_user" can only use DML stataments. In MonetDB, owner of a table is owner of a schema where some table is.ALTER TABLE a_user.a_table RENAME COLUMN a_number TO zzz;

mclient -u a_user -d voc    #password "1"
REVOKE ALL ON TABLE a_user.a_table FROM b_user;
This is how we can revoke privileges given to "b_user".

Grant and Revoke Separate Privileges

I will again give privileges to "b_user" for the table "a_table".
GRANT SELECT ( a_number ), INSERT, UPDATE ( a_number, a_letter ) ON a_user.a_table TO b_user FROM CURRENT_USER;
This time, instead of "ALL", I will give each privilege separately.  This time I will give him SELECT, INSERT and UPDATE privileges. Beside those we can give him privileges to DELETE, TRUNCATE, and REFERENCE. Any combination is allowed.
Notice that for SELECT, UPDATE and REFERENCES we can limit our permission to a list of columns.

I will then login as a "b_user" and I will try SELECT and DELETE on "a_table".
mclient -u b_user -d voc    #password "1"

SELECT * FROM a_user.a_table;   –successful, but only 1 column
SELECT a_number, a_letter FROM a_user.a_table;  
–unsuccessful
DELETE FROM a_user.a_table;               
–unsuccessful

mclient -u a_user -d voc    #password "1"
REVOKE ALL ON TABLE a_user.a_table FROM b_user;
REVOKE SELECT ON TABLE a_user.a_table FROM b_user;
I will try to revoke privileges as a group.

I will also try to revoke SELECT privilege for all of the columns. Both will succeed.

mclient -u b_user -d voc    #password "1"
SELECT * FROM a_user.a_table;
Revoking didn't work. Privileges SELECT, SELECT ( column ), ALL, are all different privileges. They are given and revoked separately.

mclient -u a_user -d voc    #password "1"
REVOKE SELECT ( a_number ), INSERT, UPDATE ( a_number, a_letter ) ON a_user.a_table FROM b_user;
Privileges must be defined in REVOKE
in the same way as in GRANT.

mclient -u b_user -d voc    #password "1"
SELECT * FROM a_user.a_table;
Now, it worked.

Grant and Revoke Reference Privileges

"b_user" can reference table that belongs to "a_user". We didn't even have to grant rights. This is a bag. This shouldn't be allowed. GRANT and REVOKE statements are totally useless for REFERENCES privilege, because this privilege is always given.

ALTER TABLE b_user.b_table ADD CONSTRAINT ReferenceToA_Table FOREIGN KEY ( b_number ) REFERENCES a_user.a_table ( a_number );

Grant and Revoke Function Execution Rights

I will grant execution rights on a function to "b_user".mclient -u a_user -d voc    #password "1"
GRANT EXECUTE ON FUNCTION a_user.a_Fx() TO b_user;

"b_user" can now call a function.
mclient -u b_user -d voc    #password "1"
SELECT a_user.a_Fx();

mclient -u a_user -d voc    #password "1"
REVOKE EXECUTE ON FUNCTION a_user.a_Fx() FROM b_user;
I will revoke function execution rights.

This will also work for procedures.-- GRANT EXECUTE ON PROCEDURE procTest() TO b_user;
-- REVOKE EXECUTE ON PROCEDURE  procTest() FROM b_user;

Grant and Revoke COPY Rights

In this blog post "https://bizkapish.com/monetdb/0350-exporting-data-and-binary-files/" we can see how to export and import files on the client computer by using python. This right is automatically given to all of the users.

mclient -u monetdb -d voc    #password "monetdb"
GRANT COPY INTO, COPY FROM TO a_user;
We have to be administrator to give someone rights to conduct import and export of data, if the file is on the server.

mclient -u a_user -d voc    #password "1"
COPY SELECT * FROM a_user.a_table INTO '/home/fffovde/Desktop/a_table.csv';
Now, "a_user" can export (and import)
table to CSV (from).

Grant to PUBLIC

There is a special role "PUBLIC". Everyone has this role.GRANT SELECT ON a_user.a_table TO PUBLIC;

mclient -u c_user -d voc    #password "1"
SELECT * FROM a_user.a_table;
"c_user" now has SELECT rights on the "a_table". Just like everyone else.

mclient -u a_user -d voc    #password "1"
REVOKE SELECT ON a_user.a_table FROM PUBLIC;
I will the revoke the rights from the PUBLIC role.

WITH GRANT OPTION

Here is a visual representation of what will happen in the next example.

I will now grant grantable ( delegable ) right to the user "b_user".GRANT SELECT ON a_user.a_table TO b_user WITH GRANT OPTION;

mclient -u b_user -d voc #password "1"
SELECT * FROM a_user.a_table;
"b_user" received the right to SELECT from the "a_table". But he also received the right to give that right to someone else.

"b_user" can now give SELECT right to the user "c_user".
GRANT SELECT ON a_user.a_table TO c_user WITH GRANT OPTION;

So, we have two rights, right to read and right to grant.

mclient -u a_user -d voc    #password "1"
REVOKE GRANT OPTION FOR SELECT ON a_user.a_table FROM b_user;
REVOKE                  SELECT ON a_user.a_table FROM b_user;
In MonetDB, it is not possible to revoke these two rights separately. Statement like these ones will remove all of the rights from "b_user".

mclient -u b_user -d voc    #password "1"
GRANT SELECT ON a_user.a_table TO voc;

mclient -u c_user -d voc    #password "1"
SELECT * FROM a_user.a_table;
GRANT SELECT ON a_user.a_table TO voc WITH GRANT OPTION;
Interesting thing is that the user "c_user" will keep both of his rights. He can still read from the "a_table", and can further grant that right.

How to revoke these rights now? User "c" can revoke his rights from the "voc".REVOKE GRANT OPTION FOR SELECT ON a_user.a_table FROM voc;
But, what about "c_user", who can revoke his rights?

mclient -u b_user -d voc    #password "1"
REVOKE GRANT OPTION FOR SELECT ON a_user.a_table FROM c_user;
"b_user" can not do it any more.

mclient -u a_user -d voc    #password "1"
REVOKE GRANT OPTION FOR SELECT ON a_user.a_table FROM c_user;
"a_user" has to do it.

This means that "a_user" can revoke the rights on the "a_table" from anyone, anytime. As you can see, giving grantable rights can create a confusion where we don't know any more who has the rights and who hasn't. I think we should avoid using this option.

Grant Privileges to Roles

Grant Privileges to a User Through a Role

As admin I will create a role.
mclient -u monetdb -d voc  #password "monetdb"
CREATE ROLE Role1;
GRANT SELECT ON a_user.a_table TO Role1;

GRANT Role1 TO b_user;
I will then log as the "b_user". He will be able to
read from the "a_user.a_table".
mclient -u b_user -d voc    #password "1"
SET ROLE
Role1;

SELECT * FROM a_user.a_table;

We saw how to grant rights to the role, how to grant role to the user, and how user can use role with "SET ROLE" statement.

The Problem: Anyone Can Delete a Role

mclient -u c_user -d voc    #password "1"
DROP ROLE Role1;
Any user, including "c_user", can delete a role.

Just as anyone can reference any table (we saw that earlier), anyone can delete the role. This is also a bug, database shouldn't work like this.

Create Another Administrator

The administrator privileges are contained in the predefined role "sysadmin". Anyone who take that role will have administrative rights.

mclient -u monetdb -d voc    #password "monetdb"
GRANT sysadmin TO c_user;
We will then jump to be "c_user".
mclient -u c_user -d voc    #password "1"

"c_user" will try to create a new user.
CREATE USER d_user WITH PASSWORD '1' NAME 'd';
He will fail.
He first has to start using this new role.
SET ROLE sysadmin;
Now it is working.

If we try to create a new role, we will fail.
CREATE ROLE ZZZ;
We are failing because default clause is "WITH ADMIN CURRENT_USER". We have to change this to "CURRENT_ROLE" (sysadmin role).
CREATE ROLE ZZZ WITH ADMIN CURRENT_ROLE;

This is where we can see the difference between "WITH ADMIN CURRENT_USER" and "WITH ADMIN CURRENT_ROLE".

We can see in the sys.auths table that "c_user" created "d_user", because he has "sysadmin" role. The role "zzz" was created by the "sysadmin" role. 
SELECT * FROM sys.user_role;
SELECT * FROM sys.auths WHERE id IN ( 2, 15601, 15709, 15710 );

Grant a Role

We are still "c_user" and we will grant a role to "a_user".
GRANT ZZZ TO a_user;
We will fail.
Just like previously, we have to grant a role as a CURRENT_ROLE.
GRANT ZZZ TO a_user FROM CURRENT_ROLE;
Now it works.

The "a_user" will be now able to use "ZZZ" role, but he will not be able to grant that role further to the "b_user". I want to change that.

I will revoke "ZZZ" from "a_user".
REVOKE ZZZ FROM a_user FROM CURRENT_ROLE;
Then, I will grant him the same role, but with ADMIN rights for that role.
GRANT ZZZ TO a_user WITH ADMIN OPTION FROM CURRENT_ROLE;
We'll now jump to be "a_user".
mclient -u a_user -d voc    #password "1"
SET ROLE ZZZ;
GRANT ZZZ TO b_user;
This works. Now "b_user" also has "ZZZ" role assigned.

Role Hierarchy

I will create two roles. Then I will create a hierarchy between them.

mclient -u monetdb -d voc   
CREATE ROLE ParentRole;
CREATE ROLE ChildRole;
I will now grant "ChildRole" to "ParentRole", with ADMIN OPTION.

GRANT ChildRole TO ParentRole WITH ADMIN OPTION;
This is how we create hierarchy. "ParentRole" can delegate "ChildRole".

I will then give some privilege to the ChildRole. Just so we can test it.GRANT SELECT ON b_user.b_table TO ChildRole;

In the next step, I will grant only the ParentRole to the "c_user".GRANT ParentRole TO c_user; 

We haven't granted ChildRole to the "c_user" but he will still be able to use it by utilizing the hierarchy.
mclient -u c_user -d voc    #password "1"
SET ROLE ParentRole;
GRANT ChildRole TO c_user FROM CURRENT_ROLE;
SET ROLE ChildRole;
SELECT * FROM b_user.b_table;
So, "c_user" will SET the role "ParentRole", and then he will use "WITH ADMIN OPTION" right to delegate "ChildRole" from the "ParentRole" to himself.

After the grant, he can use SET ROLE ChildRole to get access to the table "b_user.b_table".

System Tables

Sistem table "sys.privilege_codes" contains code for all of the possible combinations of SELECT, UPDATE, INSERT, DELETE, TRUNCATE, GRANT.

SELECT * FROM sys.privilege_codes ORDER BY privilege_code_id;  

In the table "sys.privileges" we can find the list of the given privileges. On the image we can see that "b_user" gaved "c_user" SELECT grantable rights on the "a_table".
SELECT * FROM sys.privileges WHERE obj_id = 15658;