bizkapish

0370 Temporary Tables in MonetDB

MonetDB / By Bizkapish / March 13, 2025 April 16, 2025

Sample Table

We will login as adminstrators, but we will set VOC shema as default. Password is "monetdb". mclient -u monetdb -d voc;
SET SCHEMA voc;

Temporary Tables

After the execution of the query, if the result is not saved in a table or sent to an application, the result of a query will be discarded. Queries are transient, but tables are permanent. Tables will permanently save data stored in them. Between queries and tables, we have temporary data structures called temporary tables. These structures are used to store session-specific data that only needs to exist for a short duration.

Creation of a Local Temporary Table

We will create a temporary table that will exist only during one session. Such temporary tables are called LOCAL temporary tables. Default behavior of temporary tables is to lose their content at the end of transaction. We can prevent that with option on commit PRESERVE ROWS. We don't want the temporary table to be emptied at the end of the transaction because we want to observe the behavior of the table.

We can not create temporary table in some schema other than "tmp".
CREATE LOCAL TEMPORARY TABLE voc.tempTable2 ( Number INTEGER PRIMARY KEY );

We can not create permanent objects in "tmp".
SET SCHEMA tmp; CREATE TABLE ZZZ ( Number INTEGER ); SET SCHEMA voc;

Usage of a Temporary Table

We can NOT alter our temporary table. `ALTER TABLE tempTable ADD COLUMN Letter VARCHAR(10);`
`ALTER TABLE tempTable DROP CONSTRAINT temptable_number_pk;`

It is not possible to create foreign key constraint on the permTable if it references tempTable.

ALTER TABLE voc.permTable ADD CONSTRAINT FromTempTableConstraint FOREIGN KEY ( Number ) REFERENCES tmp.tempTable ( Number );

Info About Temporary Tables

We can NOT get statistics about our temporary table.
SELECT * FROM sys.statistics( 'tmp','temptable');

Visibility of Local Temporary Table

We can terminate our "tempTable" by dropping it explicitly, even before the end of the session. DROP TABLE tempTable;

ON COMMIT DELETE ROWS

"ON COMMIT DELETE ROWS" subclause means that after each transaction, data will be deleted. This is default behavior.

AS SELECT

We can create temporary table based on some other table. Base table can be temporary or normal table.

CREATE LOCAL TEMPORARY TABLE tempTable2 ( Number ) AS ( SELECT Number FROM tempTable ); CREATE LOCAL TEMPORARY TABLE tempTable3 ( Number ) AS (SELECT Number FROM voc.permTable ) ON COMMIT PRESERVE ROWS;

Global Temporary Tables

Global temporary tables are somewhere between normal tables and local tables. Their definition ( columns and data types ) is permanent. Name of the global table has to be unique in the "tmp" schema. Only users with authorization over "tmp" schema can create global temporary tables. In our example, administrator "monetdb" can create global temporary tables, but "voc" user can not.

The thing that makes these tables temporary is their data. All the rows of the global temporary tables will be deleted after each transaction (for ON COMMIT PRESERVE ROWS) or after the session (ON COMMIT DELETE ROWS).

While definition of the global temporary tables is shared, data is not. Data placed in the global table by one user can not be seen by another user. So, global temporary table is a playground where each user can play with his own data.

Global temporary tables have similar characteristics as local temporary tables. We can use SELECT, DELETE, UPDATE. We can export them to CSV file. We can NOT alter global tables. We can create views on them. So, everything is the same as for local temporary tables.

Creation of the Global Temporary Table

We create global temporary table with similar statement as for the local temporary tables.

CREATE GLOBAL TEMPORARY TABLE globTable ( Number INTEGER PRIMARY KEY ) ON COMMIT PRESERVE ROWS;

This will fail for the "voc" user who doesn't have enough privileges over "tmp" schema.

Privileged users can successfully create global temporary table, but not if the table with such name already exist. It is not possible for two users to create global tables with the same names.

Visibility of Global Temporary Table

If we try to read our table from the session of the "voc" user, we will see empty table. This show us that definition of a table is shared, but data is not shared.
SELECT * FROM globTable;

When to Use Temporary Tables

You can create an excerpt from some big table. After that, you can run you queries on that smaller table, instead of the big one.
Because temporary tables are volatile and data is isolated between users, so temporary tables are great for experiments.
Temporary tables should not be used as an intermediate step in queries. In that case, it is much wiser to use CTE.

0360 Loader Functions in MonetDB

MonetDB / By Bizkapish / February 22, 2025 February 23, 2025

Monetdb-Python3 Integration Package

Previously, we have installed MonetDB with two packages. We have installed monetdb5-sql and monetdb-client.

For python, we will need one more package. Monetdb-Python3 is integration package that allows MonetDB to interact with python. sudo apt install monetdb5-sql monetdb-client

sudo apt install monetdb-python3

I have the version 11.51.7 of MonetDB server. `monetdb --version`
I can install the last version of monetdb-python3: `sudo apt install monetdb-python3`	Or, I can install specific version of monetdb-python3: `sudo apt install monetdb-python3=11.51.7`

Enabling Embedded Python

I will first start monetdb daemon:

monetdbd start /home/fffovde/DBfarm1

Now we can login to our database. I will login as an administrator, although that is not needed, any user has ability to create LOADER function. mclient -u monetdb -d voc Password: monetdb

Python LOADER Function

Instead of the python lists, we can also use NumPy arrays. Instead of [1, 2, 3, 4, 5], we can use np.array( [1, 2, 3, 4, 5] ). NumPy arrays are faster.

Using LOADER Function

It is also possible to add data to an existing table. I will first truncate myLoaderTable and then I will append new data to an existing table.

TRUNCATE myLoaderTable;
COPY LOADER INTO myLoaderTable FROM myloader();

Using a Parameter in a LOADER function

With python we can pull data from anywhere, from any database or file. Here is an example where we will read data from a JSON file.

CREATE LOADER json_loader(filename STRING) LANGUAGE PYTHON {
import json
f = open(filename)
_emit.emit(json.load(f))
f.close()
}; This is how we can create LOADER function, that will read from our JSON file. This time we are using an argument for our function. This argument is of the STRING data type. STRING is an alias for the CLOB data type in MonetDB.

json module is builtin Python3 module.

Missing Columns

TRUNCATE myLoaderTable;

Delete LOADER function

DROP LOADER FUNCTION sys.myloader2; We can always delete this function with DROP LOADER FUNCTION statement.

0350 Exporting Data and Binary Files in MonetDB

MonetDB / By Bizkapish / February 15, 2025 February 22, 2025

Sample Table

Exporting Data from MonetDB to CSV

We saw how to import data from CSV file. For exporting to CSV file, we will again use COPY INTO statement.

Default string wrappers are not the same when writing and reading strings. They are double quotes when writing, and empty strings when reading.

We can not overwrite the existing file with the same name. That would return the error.

Exporting Compressed CSV File

If our file name has extensions .xz .bz2 .gz .lz4, then the result file will be compressed. Compression level is the best when using .xz ( .xz > .bz2 > .gz > .lz4 ). For better compression, the more time is needed.

Exporting CSV File on the Client

I will demonstrate exporting of a CSV file, on the client, by using python. We will use the code bellow. This is the same code used for uploading files, with two distinctions. Instead "set_uploader" we are using "set_downloader". COPY INTO statement is also different.

Binary Files

MonetDB can import/export data even faster than from/to CSV files. For that we can use binary files.

Little Endian vs Big Endian

Let's say that we have hexadecimal number 0x12345678. Every two figures represent one byte:

12 => 00001100

34 => 00100010

56 => 00111000

78 => 01001110

There is also a term Native Endian. That is the preferred byte order of the system MonetDB is running on. If your system is using AMD, ARM, or Intel CPU, then your system is using Little Endian.

We can check the Endianness of our system.
lscpu | grep "Byte Order"

Exporting Binary Files

MonetDB can export data into MonetDB custom binary files. Each table column will become a separate file. COPY SELECT * FROM tblCSV INTO LITTLE ENDIAN BINARY '/home/fffovde/Desktop/Letter', '/home/fffovde/Desktop/Number', '/home/fffovde/Desktop/Calendar' ON SERVER;

Instead of "Little Endian" we can use "Big Endian" or "Native Endian". Instead of "ON SERVER", we can use "ON CLIENT". For exporting data on the client, we can reuse the same python script shown above.

Loading Binary Files to MonetDB

Exported binary files can be imported in any MonetDB database. Before import, we can empty database table "TRUNCATE tblCSV;".

TRUNCATE tblCSV;

TRUNCATE tblCSV;

COPY LITTLE ENDIAN BINARY INTO tblCSV FROM
'/home/fffovde/Desktop/Letter',
'/home/fffovde/Desktop/Number',
'/home/fffovde/Desktop/Calendar' ON SERVER; We don't have to declare columns of the database table. In that case,
we just have to make sure that order and number of files is the same
as the order and number of the columns in the table.

Loading Binary Files to MonetDB on Client

TRUNCATE tblCSV;

import pymonetdb
connection = pymonetdb.connect(username="monetdb", password="monetdb", hostname="localhost", database="voc")

handler = pymonetdb.SafeDirectoryHandler("/home/fffovde/Desktop") connection.set_uploader(handler)

cursor = connection.cursor()
cursor.execute("COPY LITTLE ENDIAN BINARY INTO tblCSV FROM '/home/fffovde/Desktop/Letter', '/home/fffovde/Desktop/Number', '/home/fffovde/Desktop/Calendar' ON CLIENT;")
connection.commit()

cursor.close()
connection.close() For uploading binary files to MonetDB, from the client, we are using the similar python script that we have used for CSV files.

0340 Loading from CSV files into MonetDB

MonetDB / By Bizkapish / February 9, 2025 February 11, 2025

Importing data from CSV files into MonetDB is much faster than using "INSERT INTO". The reason is, CSV can be loaded as is, no need for parsing like for "INSERT INTO". The other reason is because CSV file can be imported by using several CPU cores at once, while "INSERT INTO" can only use one thread.

Sample table and CSV file

To use COPY INTO statement, we have to log in as an administrator (password is monetdb):
mclient -u monetdb -d voc;

Reading from CSV File to MonetDB table

We will read our CSV file into tblCSV table by this statement bellow. User executing this statement must have privilege to read the file.

• OFFSET 2 will skip the first row. In the first row we have the header. Header already exists in our MonetDB table, so we do not import it.
• tblCSV( Letter, Number, Calendar ) – this part is the table and the columns in which we are importing data.
• '/home/fffovde/Desktop/tblCSV.csv'( Letter, Number, Calendar ) ON SERVER – location of our file.
• ',' – this is separator between columns.
• E'\n' – this is delimiter between rows. '\n' is a symbol for LF (line feed).
• '"' – strings are wrapped with double quotes.

Number of Records and Default Options

• 2 RECORDS – we can limit our import to only two rows (after the header). If we know in advance how many rows we will import, that will allow MonetDB to reserve space on disk in advance, so it will increase performance.

There is also no need to always define columns separator and string wrapper. MonetDB will assume that columns separator is either \n or \r\n. It will also assume that nothing is used as a string wrapper. On the image above, double quotes are now considered as a part of a string.

Column Names

TRUNCATE tblCSV;

If we are providing column names of the CSV file, we always have to provide all of the column names, but we can give them fake names. In the example bellow we are importing only the column "Number". Other two columns will not be imported. Because "Letter" column has a default value "Z", all the values in that column will be "Z". "Calendar" column will be filled with nulls.

TRUNCATE tblCSV;

TRUNCATE tblCSV;

If we don't provide column names for the CSV file, MonetDB will assume that CSV file has the same number of columns as the target table. In the example bellow, MonetDB assumes that our CSV file has three columns. MonetDB will raise an error that one column is leftover.

Column Separators and NULL-s

TRUNCATE tblCSV;

TRUNCATE tblCSV;

TRUNCATE tblCSV;

TRUNCATE tblCSV;

Fixed Width Columns

TRUNCATE tblCSV;

Error Handling

TRUNCATE tblCSV;

Loading Several CSV Files at Once

TRUNCATE tblCSV;

Loading Data on the Client Side

TRUNCATE tblCSV;

It is also possible to upload CSV file from the client. We will see how to do this by using python.

import pymonetdb
connection = pymonetdb.connect(username="monetdb", password="monetdb", hostname="localhost", database="voc")

handler = pymonetdb.SafeDirectoryHandler("/home/fffovde/Desktop") –this will designate this directory as safe connection.set_uploader(handler) –we can place our files into this folder to upload them



cursor = connection.cursor()   
cursor.execute("COPY INTO tblCSV FROM '/home/fffovde/Desktop/tblCSV.csv' ON CLIENT BEST EFFORT FWF ( 4, 7, 10 );") 
connection.commit()   

cursor.close() 
connection.close()

You can download CSV files from here:

tblCSV files Download

0330 Loading Data Using SQL in MonetDB and Timing

MonetDB / By Bizkapish / February 2, 2025 February 8, 2025

Sample Table

We will start mclient with the timer turned on. The timer will measure the time to execute the query.

mclient --timer="clock" -u voc -d voc

Then we will create the sample table:

Inserting Data with INSERT INTO

We can insert data through some application to MonetDB, by sending INSERT INTO statements. This will work great if we don' t load a lot of rows. If we are USING a lot of consecutive INSERT statements then we can have performance issues.

INSERT INTO tblSample ( Letter, Number, Calendar ) VALUES ( 'D', 4, '2024-02-04' );
··· 1.000.000 X ···
INSERT INTO tblSample ( Letter, Number, Calendar ) VALUES ( 'ZZZZZZZ', 1000004, '2240-02-04' );

We know that in MonetDB, we can insert several rows with the one INSERT statement. This will not save us, because we shouldn't use more than 20 rows per one INSERT STATEMENT. If we use more than that, we will decrease performance.

INSERT INTO tblSample ( Letter, Number, Calendar ) VALUES ( 'D', 4, '2024-02-05' )
                                                        , ( 'E', 5, '2024-02-06' )
                                                        ··· 17 X ··· 
                                                        , ( 'F', 6, '2024-02-07' );    --20 rows at most

We can improve performance by following these 5 advice:

Disable autocommit. Autocommit will commit the data after each INSERT statement. If we can avoid that, we can speed up things.
We should prepare our statement. That means that our statement will be parsed and cached once. After that, each consecutive INSERT query will use the same statement, just with another parameters.
Use batch processing. Instead of sending a million INSERT statements, we can sent 100 batches of 10.000 INSERT statements. This will reduce communication latency between application and MonetDB, it will reduce memory usage and will minimize locking of a table.
We should disable optimizer. Optimizer can speed up more complex statements, but there is nothing that can be improved for the simple INSERT statement.
We can temporarily disable table constraints like primary key, foreign key or unique. We can restore those constraints after the import.

SQL benchmark

We'll insert one row, with one INSERT INTO statement. Then we'll see if we can noticeably increase the speed by following the tips above.

INSERT INTO tblSample ( Letter, Number, Calendar ) VALUES ( 'D', 4, '2024-02-05' );

First, we will disable our optimizer.
SET sys.optimizer = 'minimal_pipe';

In order to delete constraints, we have to found out their names. We can do that from the system tables.

We will not remove constraint "NOT NULL", because that constraint will not restrain performance.

START TRANSACTION; Now, we will start a transaction to disable autocommit.

After all this, we will again check the timing of our INSERT statement. We are now faster.
EXECUTE 0( 'E', 5, '2024-02-06');

The last thing is that we have to change
everything the way it was. COMMIT; -- finish transaction
DEALLOCATE PREPARE ALL; -- delete prepared statement
SET sys.optimizer = 'default_pipe'; -- turn on optimizer
ALTER TABLE tblSample ADD UNIQUE ( Letter ); -- bring back unique constraint
ALTER TABLE tblSample ADD PRIMARY KEY ( Number );-- bring back primary key constraint

Python Benchmark

We will now try INSERT with python script. In the blog post "Connect to MonetDB from Python" we have already saw how to use python with MonetDB. Bellow is the script we will use now. This time we will insert 10 rows of data.

import pymonetdb
import time

connection = pymonetdb.connect(username="voc", password="voc", hostname="localhost", database="voc")
cursor = connection.cursor()
insert_statements = [
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('F', 6, '2024-02-06');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('G', 7, '2024-02-07');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('H', 8, '2024-02-08');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('I', 9, '2024-02-09');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('J', 10, '2024-02-10');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('K', 11, '2024-02-11');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('L', 12, '2024-02-12');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('M', 13, '2024-02-13');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('N', 14, '2024-02-14');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('O', 15, '2024-02-15');",
    ]
overall_start_time = time.time()
for i, sql in enumerate(insert_statements, start=1):
       cursor.execute(sql)
connection.commit()
overall_end_time = time.time()
total_time = overall_end_time - overall_start_time
print(f"\n⏳ Total execution time for all inserts: {total_time:.6f} seconds")
cursor.close()
connection.close()

Total execution time is 0.008114 seconds.

Python Benchmark With Accelerations

We can speed up our Python script by using all of the advice mentioned in the text above. This is how our python procedure now looks like:

import pymonetdb
import time

connection = pymonetdb.connect(username="voc", password="voc", hostname="localhost", database="voc")
cursor = connection.cursor()
overall_start_time = time.time()
cursor.execute("SET sys.optimizer = 'default_pipe';")
cursor.execute("ALTER TABLE tblSample DROP CONSTRAINT tblsample_number_pkey;")
cursor.execute("ALTER TABLE tblSample DROP CONSTRAINT tblsample_letter_unique;")
sql = "INSERT INTO tblSample (Letter, Number, Calendar) VALUES (%s, %s, %s);"
data = [    ('P', 16, '2024-02-16'),
            ('Q', 17, '2024-02-17'),
            ('R', 18, '2024-02-18'),
            ('S', 19, '2024-02-19'),
            ('T', 20, '2024-02-20'),
            ('U', 21, '2024-02-21'),
            ('V', 22, '2024-02-22'),
            ('W', 23, '2024-02-23'),
            ('X', 24, '2024-02-24'),
            ('Y', 25, '2024-02-25')
        ]
overall_start_time = time.time()
cursor.executemany(sql, data)
connection.commit()
overall_end_time = time.time()
cursor.execute("DEALLOCATE PREPARE ALL;")
cursor.execute("SET sys.optimizer = 'default_pipe';")
cursor.execute("ALTER TABLE tblSample ADD PRIMARY KEY ( Number )")
cursor.execute("ALTER TABLE tblSample ADD UNIQUE ( Letter );")
connection.commit()
total_time = overall_end_time - overall_start_time
print(f"\n⏳ Total execution time for all inserts: {total_time:.6f} seconds")
cursor.close()
connection.close()

We don't have to explicitly start a new transaction, pymonetdb will do that automatically.

Now our timing is 0.004584 seconds.

Timing

How to Measure Time of the Query Execution

mclient --timer="clock" -u voc -d voc Timer can be "none" (default), "clock" or "performance".

Bellow we can results for these three modes of the timer switch.

When we use "performance", we get 4 results. "SQL" is time used for parsing. "Opt" is time used for optimizing statement. "Run" is time used for running the statement. "Clk" is total time used.

Query History

Data about executed statements is kept in the two tables. Those two tables can be returned with functions " sys.querylog_catalog()" and "sys.querylog_calls()". In order to work with those tables we have to login as administrator.

mclient -u monetdb -d voc Password is "monetdb".

Data saved in these two tables is persistent between sessions. We can use procedure "sys.querylog_empty()" to clear content from those two tables.

CALL sys.querylog_empty(); –procedures are started with the "call" keyword

In the current session we can start logging with the procedure "querylog_enable()".

CALL querylog_enable();

After that, I will run statement "SELECT * FROM voc.tblSample;" three times.

We can also read from the "querylog_calls()" table.

We can stop logging or queries before the end of the session with:

CALL querylog_disable();

Threshold

Each time we enable logging, our logging tables will become bigger and bigger. This can make the search for a query troublesome. In order to control amount od statements that will be logged, we can use "threshold" argument.

CALL querylog_enable(5000);

The threshold will limit the logged statements to only those whose execution time is longer than 5000 milliseconds. This allows us to perform profiling, to find the queries that are sucking up our resources the most.

`CREATE LOADER myloader() LANGUAGE PYTHON { _emit.emit( { 'Col1': [ "A", "B" ], 'Col2': [ 1, 2 ] } ) _emit.emit( { 'Col1': [ "C", "D" ], 'Col2': [ 3, 4 ] } ) };`	This statement will create LOADER function. Columns are defined as python lists. Each list, together with the name of a column, is placed inside of the python dictionary.
	We are using function "_emit.emit" do divide our inserts into chunks. In this way we can preserve memory. After inserting the first chunk, (A1,B2), we can delete it from the memory, and we can continue inserting the second chunk (C3,D4).