0390 Unlogged tables in MonetDB

MonetDB / By Bizkapish / March 24, 2025 March 28, 2025

Write Ahead Log

Unlogged Tables

Unlogged Tables are just like normal database tables, except they are not using WAL.

Normal tables are written like:

Unlogged tables are skipping WAL:

Unlogged tables are almost like normal tables:
1) They are written to disk.
2) After normal shutdown of a system, content of unlogged tables will be preserved.
3) They have transactions, but their transactions are not using WAL. Their transactions exist only in RAM memory.
4) Content of these tables is available to all of the users, just like for normal tables.

The only difference is in the case of the system crush. After the crush, content of normal tables will be restored to a consistent state by using WAL. Unlogged table will be truncated during the recovery. Server can not guarantee consistency of unlogged tables without WAL, so it will delete their content. This is why unlogged tables should be used only for temporary and re-creatable data.

Writing to unlogged tables can be several times faster then writing to normal tables. Without WAL, we can write much faster. We are sacrificing reliability of unlogged tables for better performance.

Unlogged Tables in MonetDB

I will login as administrator.
I will change current schema to "voc". monetdbd start /home/fffovde/DBfarm1 mclient -u monetdb -d voc –password monetdb SET SCHEMA voc;

If we quit our session, and we log in again, we will still be able to use our unlogged table.
quit mclient -u monetdb -d voc SELECT * FROM voc.UnLogTab; Both the data and the table structure will be preserved. Unlogged table can last through several sessions.

Sample Table

What Happens After the Crash?

Difference Between MonetDB and Some Other Servers

ALTER TABLE Tab1 SET UNLOGGED; ALTER TABLE UnLogTab SET LOGGED; Unlike some other databases, MonetDB doesn't have ability to transform unlogged
tables to logged tables and vice versa. These statements will not work.

System Tables

We can find information about unlogged tables in the sys.tables and sys.statistics.

0380 Merge Tables in MonetDB

MonetDB / By Bizkapish / March 16, 2025 March 19, 2025

Why Partitioning?

Benefits and Drawbacks of Partitioning

Queries are faster. Instead of scanning the entire table, we will scan only the necessary partitions. The database is smart enough to discard partitions that do not have a relevant date. This is called "partition pruning". For example, to see sales only for the year 2024, we can query only the 2024 partition and ignore all the others.
Rebuilding indexes, updating statistics, vacuuming is easier for partitions.
Dropping, archiving, backing up, partition swapping, can be done on one part of the table. We can treat the parts of the table separately.
Partitions can be processed in parallel, on different CPU cores. Partitions can be on different storage disks.
Partitions with older/stable data can be compressed and can have multiple indexes. It is the opposite for the most recent data.

Partitioning is only really useful when we have really large tables. Large tables are those with over 100 million rows. The biggest benefit is in maintaining such large tables. It is questionable whether partitioning will improve query speeds. This will only happen if queries exclusively touch some of the partitions and not others. If there is a discrepancy between how users discriminate the data and how we have defined our partitions, we could reduce performance rather than improve it.

Simple Start

First, we will create merge table. It is not possible to query this table until we add some partitions to it.

CREATE MERGE TABLE Merg ( Letter VARCHAR(10), Number INT ); SELECT * FROM Merg;

CREATE TABLE Tab1 ( Letter VARCHAR(10), Number INT ); INSERT INTO Tab1 (Letter, Number) VALUES ('A', 50), ('A', 60); CREATE TABLE Tab2 ( Letter VARCHAR(10), Number INT ); INSERT INTO Tab2 (Letter, Number) VALUES ('B', 150), ('B', 160);

We can see that merge tables are similar to union queries. UNION queries are verbose, while merge table queries are short and simple. UNION queries are more computationally intensive and use more memory. A merge table can effectively use indexes that are set up over individual partition tables.

On the other hand, UNION queries are necessary when the base tables have different structures that require transformation.

System Tables and Removing Partitions

This system table will show us partitions of our merge table. ID of the merge table is 11077.

Problem With Simple Approach

If we try INSERT, UPDATE, DELETE, TRUNCATE on the merge table, we will fail.
UPDATE Merg SET Number = 170 WHERE Letter = 'B';

We will delete our merge table, because we want to create it in a way that will allow INSERT, UPDATE, DELETE, TRUNCATE.
DROP TABLE Merg;

This time we will provide merge table with a rule by which merge table will differentiate between partitions.

CREATE MERGE TABLE Merg ( Letter VARCHAR(10), Number INT ) PARTITION BY VALUES ON ( Letter );

Only when I truthfully declare my partition as defined by the "A" in the "Letter" column, will my partition be accepted.
ALTER TABLE Merg ADD TABLE Tab1 AS PARTITION IN ( 'A' );

But what if I have another table that only has "A" in the "Letter" column. I will create such table, and I will try to add it to the merge table.

CREATE TABLE Tab3 AS ( SELECT * FROM Tab1 ) WITH DATA;
ALTER TABLE Merg ADD TABLE Tab3 AS PARTITION IN ( 'A' );

Now we have a conflict. Definitions of partitions have to be unique. "Tab3" will be rejected.

Partition With Multiple Values in the Letter Column

I will add one more row in the "Tab2" table. After that I will add "Tab2" to the merge table.

Let's Try Modifying "Merg" Table Directly

Let us now try to INSERT a row directly into "Merg" table.

`INSERT INTO Merg ( Letter, Number ) VALUES ( 'Z', 999 );` There is no "Z" partition, so this INSERT will be rejected.
`INSERT INTO Merg ( Letter, Number ) VALUES ( 'A', 70 );` Success! "Merg" now knows where to insert a new row (into "Tab1").

Let's update this new row.
UPDATE Merg SET Number = 71 WHERE Letter = 'A' AND Number = 70;

Let's delete this new row.
DELETE FROM Merg WHERE Letter = 'A' AND Number = 71;

But what if I modify "Tab1" directly. Will that confuse "Merge" table?
UPDATE Tab1 SET Letter = 'Z';
<= As we can see, merge table is protected from the rule violation.

Redefining A Partition

We have new records with the letter "Z", but we have only a few of them. I want to add them to "Tab1" partition. We know that "Tab1" will reject them.

In order to avoid that, I will redefine "Tab1" to accept "Z" record.
ALTER TABLE Merg SET TABLE Tab1 AS PARTITION IN ( 'A', 'Z' );

Let's insert now "Z" record into "Tab1".
INSERT INTO Merg ( Letter, Number ) VALUES ( 'Z', 70 );
"Merg" table will now accept "Z" record.

Other Ways How to Define Partitioning Rule

Partition By Range

CREATE MERGE TABLE MergRange ( Letter VARCHAR(10), Number INT ) PARTITION BY RANGE ON ( Number );

We'll add "Tab1" and "Tab2" to this new "MergRange". Problem is that one table can not be part of several merge tables.

ALTER TABLE MergRange ADD TABLE Tab1 AS PARTITION FROM 1 TO 100;

We will first remove "Tab1" and "Tab2" from the "Merg" table, and then we will add them to the "MergRange" table.

ALTER TABLE Merg DROP TABLE Tab1; ALTER TABLE Merg DROP TABLE Tab2; ALTER TABLE MergRange ADD TABLE Tab1 AS PARTITION FROM 1 TO 100;
ALTER TABLE MergRange ADD TABLE Tab2 AS PARTITION FROM 101 TO 200;

Partition By Value Expression

So far, we have only defined partitions using a single column. Now we will use expression to define partitions. Expression "Letter || CAST( Number AS VARCHAR(10))" says that columns "Number" and "Letter", together, define partition.

CREATE MERGE TABLE MergExpression ( Letter VARCHAR(10), Number INT ) PARTITION BY VALUES USING ( Letter || CAST( Number AS VARCHAR(10) ) );

We will remove "Tab1" and "Tab2" from the merge table "MergRange". Then, we will add them to the "MergExpression" table.

ALTER TABLE MergRange DROP TABLE Tab1; ALTER TABLE MergRange DROP TABLE Tab2; ALTER TABLE MergExpression ADD TABLE Tab1 AS PARTITION IN ( 'A50', 'A60', 'Z70' ); ALTER TABLE MergExpression ADD TABLE Tab2 AS PARTITION IN ( 'B150', 'B160', 'C170' );

Partition by Range Expression

It is also possible to use an expression to calculate the value that will be used to determine range membership.

CREATE MERGE TABLE MergRangeExpression ( Letter VARCHAR(10), Number INT ) PARTITION BY RANGE USING ( Number + char_length( Letter ) );

Again, we will untie our tables from the previous merge table, and then we will add them to the MergRangeExpression table.

ALTER TABLE MergExpression DROP TABLE Tab1;
ALTER TABLE MergExpression DROP TABLE Tab2; ALTER TABLE MergRangeExpression ADD TABLE Tab1 AS PARTITION FROM 1 TO 100;
ALTER TABLE MergRangeExpression ADD TABLE Tab2 AS PARTITION FROM 101 TO 200;

Partition By NULLS

We have to declare what partition will have nulls. Obviously we have to place all the NULLS into only one partition.

ALTER TABLE Merg ADD TABLE Tab1 AS PARTITION IN ( 'A' ) WITH NULL VALUES; In this case all nulls would belong to partition "Tab1".

ALTER TABLE Merg ADD TABLE Tab2 AS PARTITION FROM 1 TO 9 WITH NULL VALUES; All nulls belong to partition "Tab2".

ALTER TABLE Merg ADD TABLE Tab3 AS PARTITION FOR NULL VALUES; In this case, all nulls belong to partition "Tab3".

PARTITION System Tables

When we define partitioning rule (when we use PARTITION clause), that partition rule will be register in these system tables.

System table "sys.range_partitions" is used when partitioning is made by the ranges ( 1-100, 101-200 ).

Merge Table Based on Another Table

It is possible to give merge table definition from some other table. WITH NO DATA is mandatory.
CREATE MERGE TABLE MergAS ( Letter, Number ) AS ( SELECT * FROM Tab1 ) WITH NO DATA;

0370 Temporary Tables in MonetDB

MonetDB / By Bizkapish / March 13, 2025 April 16, 2025

Sample Table

We will login as adminstrators, but we will set VOC shema as default. Password is "monetdb". mclient -u monetdb -d voc;
SET SCHEMA voc;

Temporary Tables

After the execution of the query, if the result is not saved in a table or sent to an application, the result of a query will be discarded. Queries are transient, but tables are permanent. Tables will permanently save data stored in them. Between queries and tables, we have temporary data structures called temporary tables. These structures are used to store session-specific data that only needs to exist for a short duration.

Creation of a Local Temporary Table

We will create a temporary table that will exist only during one session. Such temporary tables are called LOCAL temporary tables. Default behavior of temporary tables is to lose their content at the end of transaction. We can prevent that with option on commit PRESERVE ROWS. We don't want the temporary table to be emptied at the end of the transaction because we want to observe the behavior of the table.

We can not create temporary table in some schema other than "tmp".
CREATE LOCAL TEMPORARY TABLE voc.tempTable2 ( Number INTEGER PRIMARY KEY );

We can not create permanent objects in "tmp".
SET SCHEMA tmp; CREATE TABLE ZZZ ( Number INTEGER ); SET SCHEMA voc;

Usage of a Temporary Table

We can NOT alter our temporary table. `ALTER TABLE tempTable ADD COLUMN Letter VARCHAR(10);`
`ALTER TABLE tempTable DROP CONSTRAINT temptable_number_pk;`

It is not possible to create foreign key constraint on the permTable if it references tempTable.

ALTER TABLE voc.permTable ADD CONSTRAINT FromTempTableConstraint FOREIGN KEY ( Number ) REFERENCES tmp.tempTable ( Number );

Info About Temporary Tables

We can NOT get statistics about our temporary table.
SELECT * FROM sys.statistics( 'tmp','temptable');

Visibility of Local Temporary Table

We can terminate our "tempTable" by dropping it explicitly, even before the end of the session. DROP TABLE tempTable;

ON COMMIT DELETE ROWS

"ON COMMIT DELETE ROWS" subclause means that after each transaction, data will be deleted. This is default behavior.

AS SELECT

We can create temporary table based on some other table. Base table can be temporary or normal table.

CREATE LOCAL TEMPORARY TABLE tempTable2 ( Number ) AS ( SELECT Number FROM tempTable ); CREATE LOCAL TEMPORARY TABLE tempTable3 ( Number ) AS (SELECT Number FROM voc.permTable ) ON COMMIT PRESERVE ROWS;

Global Temporary Tables

Global temporary tables are somewhere between normal tables and local tables. Their definition ( columns and data types ) is permanent. Name of the global table has to be unique in the "tmp" schema. Only users with authorization over "tmp" schema can create global temporary tables. In our example, administrator "monetdb" can create global temporary tables, but "voc" user can not.

The thing that makes these tables temporary is their data. All the rows of the global temporary tables will be deleted after each transaction (for ON COMMIT PRESERVE ROWS) or after the session (ON COMMIT DELETE ROWS).

While definition of the global temporary tables is shared, data is not. Data placed in the global table by one user can not be seen by another user. So, global temporary table is a playground where each user can play with his own data.

Global temporary tables have similar characteristics as local temporary tables. We can use SELECT, DELETE, UPDATE. We can export them to CSV file. We can NOT alter global tables. We can create views on them. So, everything is the same as for local temporary tables.

Creation of the Global Temporary Table

We create global temporary table with similar statement as for the local temporary tables.

CREATE GLOBAL TEMPORARY TABLE globTable ( Number INTEGER PRIMARY KEY ) ON COMMIT PRESERVE ROWS;

This will fail for the "voc" user who doesn't have enough privileges over "tmp" schema.

Privileged users can successfully create global temporary table, but not if the table with such name already exist. It is not possible for two users to create global tables with the same names.

Visibility of Global Temporary Table

If we try to read our table from the session of the "voc" user, we will see empty table. This show us that definition of a table is shared, but data is not shared.
SELECT * FROM globTable;

When to Use Temporary Tables

You can create an excerpt from some big table. After that, you can run you queries on that smaller table, instead of the big one.
Because temporary tables are volatile and data is isolated between users, so temporary tables are great for experiments.
Temporary tables should not be used as an intermediate step in queries. In that case, it is much wiser to use CTE.

0360 Loader Functions in MonetDB

MonetDB / By Bizkapish / February 22, 2025 February 23, 2025

Monetdb-Python3 Integration Package

Previously, we have installed MonetDB with two packages. We have installed monetdb5-sql and monetdb-client.

For python, we will need one more package. Monetdb-Python3 is integration package that allows MonetDB to interact with python. sudo apt install monetdb5-sql monetdb-client

sudo apt install monetdb-python3

I have the version 11.51.7 of MonetDB server. `monetdb --version`
I can install the last version of monetdb-python3: `sudo apt install monetdb-python3`	Or, I can install specific version of monetdb-python3: `sudo apt install monetdb-python3=11.51.7`

Enabling Embedded Python

I will first start monetdb daemon:

monetdbd start /home/fffovde/DBfarm1

Now we can login to our database. I will login as an administrator, although that is not needed, any user has ability to create LOADER function. mclient -u monetdb -d voc Password: monetdb

Python LOADER Function

Instead of the python lists, we can also use NumPy arrays. Instead of [1, 2, 3, 4, 5], we can use np.array( [1, 2, 3, 4, 5] ). NumPy arrays are faster.

Using LOADER Function

It is also possible to add data to an existing table. I will first truncate myLoaderTable and then I will append new data to an existing table.

TRUNCATE myLoaderTable;
COPY LOADER INTO myLoaderTable FROM myloader();

Using a Parameter in a LOADER function

With python we can pull data from anywhere, from any database or file. Here is an example where we will read data from a JSON file.

CREATE LOADER json_loader(filename STRING) LANGUAGE PYTHON {
import json
f = open(filename)
_emit.emit(json.load(f))
f.close()
}; This is how we can create LOADER function, that will read from our JSON file. This time we are using an argument for our function. This argument is of the STRING data type. STRING is an alias for the CLOB data type in MonetDB.

json module is builtin Python3 module.

Missing Columns

TRUNCATE myLoaderTable;

Delete LOADER function

DROP LOADER FUNCTION sys.myloader2; We can always delete this function with DROP LOADER FUNCTION statement.

0350 Exporting Data and Binary Files in MonetDB

MonetDB / By Bizkapish / February 15, 2025 February 22, 2025

Sample Table

Exporting Data from MonetDB to CSV

We saw how to import data from CSV file. For exporting to CSV file, we will again use COPY INTO statement.

Default string wrappers are not the same when writing and reading strings. They are double quotes when writing, and empty strings when reading.

We can not overwrite the existing file with the same name. That would return the error.

Exporting Compressed CSV File

If our file name has extensions .xz .bz2 .gz .lz4, then the result file will be compressed. Compression level is the best when using .xz ( .xz > .bz2 > .gz > .lz4 ). For better compression, the more time is needed.

Exporting CSV File on the Client

I will demonstrate exporting of a CSV file, on the client, by using python. We will use the code bellow. This is the same code used for uploading files, with two distinctions. Instead "set_uploader" we are using "set_downloader". COPY INTO statement is also different.

Binary Files

MonetDB can import/export data even faster than from/to CSV files. For that we can use binary files.

Little Endian vs Big Endian

Let's say that we have hexadecimal number 0x12345678. Every two figures represent one byte:

12 => 00001100

34 => 00100010

56 => 00111000

78 => 01001110

There is also a term Native Endian. That is the preferred byte order of the system MonetDB is running on. If your system is using AMD, ARM, or Intel CPU, then your system is using Little Endian.

We can check the Endianness of our system.
lscpu | grep "Byte Order"

Exporting Binary Files

MonetDB can export data into MonetDB custom binary files. Each table column will become a separate file. COPY SELECT * FROM tblCSV INTO LITTLE ENDIAN BINARY '/home/fffovde/Desktop/Letter', '/home/fffovde/Desktop/Number', '/home/fffovde/Desktop/Calendar' ON SERVER;

Instead of "Little Endian" we can use "Big Endian" or "Native Endian". Instead of "ON SERVER", we can use "ON CLIENT". For exporting data on the client, we can reuse the same python script shown above.

Loading Binary Files to MonetDB

Exported binary files can be imported in any MonetDB database. Before import, we can empty database table "TRUNCATE tblCSV;".

TRUNCATE tblCSV;

TRUNCATE tblCSV;

COPY LITTLE ENDIAN BINARY INTO tblCSV FROM
'/home/fffovde/Desktop/Letter',
'/home/fffovde/Desktop/Number',
'/home/fffovde/Desktop/Calendar' ON SERVER; We don't have to declare columns of the database table. In that case,
we just have to make sure that order and number of files is the same
as the order and number of the columns in the table.

Loading Binary Files to MonetDB on Client

TRUNCATE tblCSV;

import pymonetdb
connection = pymonetdb.connect(username="monetdb", password="monetdb", hostname="localhost", database="voc")

handler = pymonetdb.SafeDirectoryHandler("/home/fffovde/Desktop") connection.set_uploader(handler)

cursor = connection.cursor()
cursor.execute("COPY LITTLE ENDIAN BINARY INTO tblCSV FROM '/home/fffovde/Desktop/Letter', '/home/fffovde/Desktop/Number', '/home/fffovde/Desktop/Calendar' ON CLIENT;")
connection.commit()

cursor.close()
connection.close() For uploading binary files to MonetDB, from the client, we are using the similar python script that we have used for CSV files.