0350 Exporting Data and Binary Files in MonetDB

Sample Table

CREATE TABLE tblCSV ( Letter VARCHAR(10), Number REAL, Calendar DATE );  

INSERT INTO tblCSV (Letter, Number, Calendar)    
        VALUES ('A', 3.82, null), ('B', 4.83, '2025-01-02'), ('C', 5.64, '2025-01-03');

Exporting Data from MonetDB to CSV

We saw how to import data from CSV file. For exporting to CSV file, we will again use COPY INTO statement.

COPY SELECT * FROM tblCSV INTO '/home/fffovde/Desktop/tblCSV';

By default, we'll get textual file, in UTF-8 encoding (1). Line endings are going to be LF (2). Column separators are pipes (3). Default string wrappers will be double quotes (4). Nulls will be represented as a >null< (5).
Notice that I have omitted "ON SERVER" subclause. Subclause "ON SERVER" is the default.

Default string wrappers are not the same when writing and reading strings. They are double quotes when writing, and empty strings when reading.

We can change default settings, like in this example. Column separator is TAB sign (1). String wrapper is a star (2). NULLs are represented with >Nil< (3).
COPY SELECT * FROM tblCSV INTO '/home/fffovde/Desktop/tblCSV' ON SERVER
USING DELIMITERS E'\t', E'\n', '*'
NULL AS 'Nil';

We can not overwrite  the existing file with the same name. That would return the error.

Exporting Compressed CSV File

If our file name has extensions  .xz  .bz2  .gz  .lz4, then the result file will be compressed. Compression level is the best when using .xz ( .xz > .bz2 > .gz > .lz4 ). For better compression, the more time is needed.

Let's use extreme zip compression (.xz).
COPY SELECT * FROM tblCSV INTO '/home/fffovde/Desktop/tblCSV.xz';  

Our file is reduced from 648 to 108 bytes.

Exporting CSV File on the Client

I will demonstrate exporting of a CSV file, on the client, by using python. We will use the code bellow. This is the same code used for uploading files, with two distinctions. Instead "set_uploader" we are using "set_downloader". COPY INTO statement is also different.

import pymonetdb 
connection = pymonetdb.connect(username="monetdb", password="monetdb", hostname="localhost", database="voc")   

handler = pymonetdb.SafeDirectoryHandler("/home/fffovde/Desktop")
connection.set_downloader(handler)                                   

cursor = connection.cursor() 
cursor.execute("COPY SELECT * FROM tblCSV INTO '/home/fffovde/Desktop/tblCSV.bz2' ON CLIENT;")
connection.commit()   

cursor.close()
connection.close()
Our file will
be saved in
compressed state:

Binary Files

MonetDB can import/export data even faster than from/to CSV files. For that we can use binary files.

Little Endian vs Big Endian

Let's say that we have hexadecimal number 0x12345678. Every two figures represent one byte:

12 => 0000110034 => 0010001056 => 0011100078 => 01001110

When we write our number to memory, we can write it in two ways.
– We can write it in order 12345678. This is called Big Endian.
– We can write it in the reverse order 78563412. This is called Little Endian.  

This distinction is important when writing binary files.

There is also a term Native Endian. That is the preferred byte order of the system MonetDB is running on. If your system is using AMD, ARM, or Intel CPU, then your system is using Little Endian.

We can check the Endianness of our system.
lscpu | grep "Byte Order"

Exporting Binary Files

MonetDB can export data into MonetDB custom binary files. Each table column will become a separate file.COPY SELECT * FROM tblCSV INTO LITTLE ENDIAN BINARY
      '/home/fffovde/Desktop/Letter',
      '/home/fffovde/Desktop/Number',       '/home/fffovde/Desktop/Calendar' ON SERVER;

As a result, we'll get three binary files. One file for each column.

Instead of "Little Endian" we can use "Big Endian" or "Native Endian". Instead of "ON SERVER", we can use "ON CLIENT". For exporting data on the client, we can reuse the same python script shown above.

Loading Binary Files to MonetDB

Exported binary files can be imported in any MonetDB database. Before import, we can empty database table "TRUNCATE tblCSV;".

COPY LITTLE ENDIAN BINARY INTO tblCSV( Letter, Number, Calendar ) FROM       '/home/fffovde/Desktop/Letter',       '/home/fffovde/Desktop/Number',       '/home/fffovde/Desktop/Calendar' ON SERVER;

TRUNCATE tblCSV;

It is possible to load less then three columns. We can see that "Calendar" column is empty.
COPY LITTLE ENDIAN BINARY INTO tblCSV( Letter, Number ) FROM
      '/home/fffovde/Desktop/Letter',
      '/home/fffovde/Desktop/Number' ON SERVER;

TRUNCATE tblCSV;

COPY LITTLE ENDIAN BINARY INTO tblCSV FROM
        '/home/fffovde/Desktop/Letter',                '/home/fffovde/Desktop/Number',                          '/home/fffovde/Desktop/Calendar' ON SERVER;           
We don't have to declare columns of the database table. In that case, we just have to make sure that order and number of files is the same as the order and number of the columns in the table.

Loading Binary Files to MonetDB on Client

TRUNCATE tblCSV;

import pymonetdb 
connection = pymonetdb.connect(username="monetdb", password="monetdb", hostname="localhost", database="voc")   

handler = pymonetdb.SafeDirectoryHandler("/home/fffovde/Desktop") connection.set_uploader(handler)                                   

cursor = connection.cursor() 
cursor.execute("COPY LITTLE ENDIAN BINARY INTO tblCSV FROM '/home/fffovde/Desktop/Letter', '/home/fffovde/Desktop/Number', '/home/fffovde/Desktop/Calendar' ON CLIENT;")
connection.commit()   

cursor.close()
connection.close()
For uploading binary files to MonetDB, from the client, we are using the similar python script that we have used for CSV files.

0340 Loading from CSV files into MonetDB

Importing data from CSV files into MonetDB is much faster than using "INSERT INTO". The reason is, CSV can be loaded as is, no need for parsing like for "INSERT INTO". The other reason is because CSV file can be imported by using several CPU cores at once, while "INSERT INTO" can only use one thread.

Sample table and CSV file

To use COPY INTO statement, we have to log in as an administrator (password is monetdb):
mclient -u monetdb -d voc;

In order to import data, first we need to have a table where we will place data.
CREATE TABLE tblCSV ( Letter VARCHAR(10) DEFAULT 'Z', Number REAL, Calendar DATE );

We'll start with this CSV file. Line ending can be \n (Linux) (1) or \r\n (Windows), it doesn't matter. File has to be encoded as UTF-8 (2).  In our example, separator between columns is comma (3). Strings are wrapped inside double quotes (4).
Letter,Number,Calendar
"A",3.82,2025-01-01
"B",4.83,2025-01-02
"C",5.64,2025-01-03

Reading from CSV File to MonetDB table

We will read our CSV file into tblCSV table by this statement bellow. User executing this statement must have privilege to read the file.

• OFFSET 2 will skip the first row. In the first row we have the header. Header already exists in our MonetDB table, so we do not import it.
• tblCSV( Letter, Number, Calendar ) – this part is the table and the columns in which we are importing data.
• '/home/fffovde/Desktop/tblCSV.csv'( Letter, Number, Calendar ) ON SERVER – location of our file.
',' – this is separator between columns.
• E'\n' – this is delimiter between rows. '\n' is a symbol for LF (line feed).
'"' – strings are wrapped with double quotes.

COPY OFFSET 2 INTO tblCSV( Letter, Number, Calendar )
FROM '/home/fffovde/Desktop/tblCSV.csv'( Letter, Number, Calendar )
ON SERVER
USING DELIMITERS ',', E'\n', '"';

SELECT * FROM tblCSV;
This is now our table.

Number of Records and Default Options

Before each insert, I will delete all the existing rows. Deletion of all of the rows can be done with TRUNCATE statement:
TRUNCATE tblCSV;

• 2 RECORDS – we can limit our import to only two rows (after the header). If we know in advance how many rows we will import, that will allow MonetDB to reserve space on disk in advance, so it will increase performance.

We can also notice, that we do not have to provide column names. In that case, we have to make sure that columns have the same order in the file, as in the database table.COPY 2 RECORDS OFFSET 2 INTO tblCSV
FROM '/home/fffovde/Desktop/tblCSV.csv' ON SERVER USING DELIMITERS ',';

There is also no need to always define columns separator and string wrapper. MonetDB will assume that columns separator is either \n or \r\n. It will also assume that nothing is used as a string wrapper. On the image above, double quotes are now considered as a part of a string.

Column Names

TRUNCATE tblCSV;

If we are providing column names of the CSV file, we always have to provide all of the column names, but we can give them fake names. In the example bellow we are importing only the column "Number". Other two columns will not be imported. Because "Letter" column has a default value "Z", all the values in that column will be "Z". "Calendar" column will be filled with nulls.

COPY OFFSET 2 INTO tblCSV( Number )
FROM '/home/fffovde/Desktop/tblCSV.csv'( Dummy, Number, Dummy2 )

ON SERVER
USING DELIMITERS ',';

TRUNCATE tblCSV;

For next example I will add one more column to my CSV file ("Color"), and I will place Number column before Letter column. I will change columns separator to semicolon, and record delimiters to CRLF. I will remove string wrapers.

Colors, Number, Letter, Calendar

If we provide column names, we can use that fact to reorder the columns in correct order.
COPY OFFSET 2 INTO tblCSV( Letter, Number, Calendar )
FROM '/home/fffovde/Desktop/tblCSV.csv'( Color, Number, Letter, Calendar )
ON SERVER
USING DELIMITERS ';', E'\n';

– This time, we are using semicolon as a separator between columns.
– We have not provided string wrappers, so MonetDB will take data as is.
– Column delimiter is defined as LF, it doesn't matter that our file is using CRLF delimiter. Symbol "\n" will work in both cases.
SELECT * FROM tblCSV;

TRUNCATE tblCSV;

If we don't provide column names for the CSV file, MonetDB will assume that CSV file has the same number of columns as the target table. In the example bellow, MonetDB assumes that our CSV file has three columns. MonetDB will raise an error that one column is leftover.

COPY OFFSET 2 INTO tblCSV( Letter, Number )
FROM '/home/fffovde/Desktop/tblCSV.csv' --x3
ON SERVER
USING DELIMITERS ';';

Column Separators and NULL-s

TRUNCATE tblCSV;

This time we will remove header, we will use pipe "|" as columns separator.
In the first row, I will place one surplus pipe at the end of the record (1).

We will take only two records. We don't have to skip header, because there is no header in the CSV file. Our separators are pipes. They are default, so we can omit "USING" clause altogether.
COPY 2 RECORDS INTO tblCSV( Letter, Number, Calendar )
FROM '/home/fffovde/Desktop/tblCSV.csv'( Color, Number, Letter, Calendar )
ON SERVER;

The lone pipe (1), from the picture above, won't cause us any problems.
SELECT * FROM tblCSV;

TRUNCATE tblCSV;

We will replace column delimiters with dots. Dot is also used as a decimal point. When we use a dot as a decimal point, we have to place an ESCAPE sign before it. ESCAPE SIGN is backslash (1).

COPY 2 RECORDS INTO tblCSV( Letter, Number, Calendar )
FROM '/home/fffovde/Desktop/tblCSV.csv'( Color, Number, Letter, Calendar )
ON SERVER
USING DELIMITERS '.';

We can see that the escape characters did their job of distinguishing decimal points.
SELECT * FROM tblCSV;

TRUNCATE tblCSV;

It is also possible to have string wrapper sign inside of a string. We can escape such sign by doubling it (1), or by placing an escape sign before it (2).

Notice, also, that I have placed the word NULL in the third record (3). By default this will be considered as a "null" in MonetDB

COPY OFFSET 0 INTO tblCSV( Letter, Number, Calendar )
FROM '/home/fffovde/Desktop/tblCSV.csv'( Color, Number, Letter, Calendar )
ON SERVER
USING DELIMITERS ',', E'\n', '"';

If we offset zero or 1 row, that means that we are reading data from the start.

TRUNCATE tblCSV;

Sometimes we want to have backslash inside of a string as a literal. In that case we use "NO ESCAPE" subclause (1).

NULL is representing null value by default. It is possible to use any other representation (2).

COPY OFFSET 0 INTO tblCSV( Letter, Number, Calendar )
FROM '/home/fffovde/Desktop/tblCSV.csv'( Color, Number, Letter, Calendar )
ON SERVER
USING DELIMITERS ',', E'\n', '"'
NO ESCAPE       --backslash is considered literal (1)
NULL AS '';     --NULL is represented by empty string (2)

Fixed Width Columns

TRUNCATE tblCSV;

Beside using comma separated format, we can also use fixed width format. In this CSV file we have three columns – Letter, Number, Calendar, but each column has the fixed width. Letter is 4 characters wide, Number is 7 characters wide, Calendar is 10 characters wide.

COPY INTO tblCSV
FROM '/home/fffovde/Desktop/tblCSV.csv' ON SERVER
FWF ( 4, 7, 10 );

We are using FWF subclause to inform MonetDB about columns width.
Spaces in the columns will be stripped and only values will be imported.

Error Handling

TRUNCATE tblCSV;

This CSV file has an error that will prevent import. There is a letter in a number column (W), in the second row.

COPY INTO tblCSV
FROM '/home/fffovde/Desktop/tblCSV.csv'

ON SERVER
FWF ( 4, 7, 10 );
We will get an error, for this statement, and we will not import any rows.

COPY INTO tblCSV
FROM '/home/fffovde/Desktop/tblCSV.csv'

ON SERVER
BEST EFFORT

FWF ( 4, 7, 10 );
Solution is to use subclause "BEST EFFORT".
Only first and third rows are now imported.
Second row is not imported.

We can find the rows that are not imported by reading from the table:
SELECT * FROM sys.rejects;
This table will preserve its content until the end of a session, or until procedure statement "CALL sys.clearrejects();" is executed.

Loading Several CSV Files at Once

TRUNCATE tblCSV;

We will import three CSV files at once. We will offset the first record in each file, and then we will read only one next row. From each file we will import only one record, so in total we will insert three records.

COPY 1 RECORDS OFFSET 2 INTO tblCSV
FROM ( '/home/fffovde/Desktop/tblCSV1.csv'      

          , '/home/fffovde/Desktop/tblCSV2.csv'      
          , '/home/fffovde/Desktop/tblCSV3.csv' ) ON SERVER;

Loading Data on the Client Side

TRUNCATE tblCSV;

It is also possible to upload CSV file from the client. We will see how to do this by using python.

import pymonetdb  
connection = pymonetdb.connect(username="monetdb", password="monetdb", hostname="localhost", database="voc")  

handler = pymonetdb.SafeDirectoryHandler("/home/fffovde/Desktop") –this will designate this directory as safe connection.set_uploader(handler)                                  –we can place our files into this folder to upload them

cursor = connection.cursor()  
cursor.execute("COPY INTO tblCSV FROM '/home/fffovde/Desktop/tblCSV.csv' ON CLIENT BEST EFFORT FWF ( 4, 7, 10 );")
connection.commit()  

cursor.close()
connection.close()

Important thing is to register some folder from the client computer as the safe folder:
handler = pymonetdb.SafeDirectoryHandler("/home/fffovde/Desktop")
After that, we assign that folder as the folder from which we can upload files.
connection.set_uploader(handler)
Then we just execute our statement.
This is the result in our database table:


You can download CSV files from here:

0330 Loading Data Using  SQL in MonetDB and Timing

Sample Table

We will start mclient with the timer turned on. The timer will measure the time to execute the query.

mclient --timer="clock" -u voc -d voc

Then we will create the sample table:

CREATE TABLE tblSample ( Letter VARCHAR(10) UNIQUE
                       , Number INT PRIMARY KEY
                       , Calendar DATE NOT NULL );  


INSERT INTO tblSample ( Letter, Number, Calendar )                  
               VALUES ( 'A', 1, '2025-02-01' )                       
                    , ( 'B', 2, '2025-02-02' )                       
                    , ( 'C', 3, '2025-02-03' );  


SELECT * FROM tblSample;   We can see that the query timing is 0.421 milliseconds.

Inserting Data with INSERT INTO

We can insert data through some application to MonetDB, by sending INSERT INTO statements. This will work great if we don' t load a lot of rows. If we are USING a lot of consecutive INSERT statements then we can have performance issues.

INSERT INTO tblSample ( Letter, Number, Calendar ) VALUES ( 'D', 4, '2024-02-04' );
··· 1.000.000 X ···
INSERT INTO tblSample ( Letter, Number, Calendar ) VALUES ( 'ZZZZZZZ', 1000004, '2240-02-04' );

We know that in MonetDB, we can insert several rows with the one INSERT statement. This will not save us, because we shouldn't use more than 20 rows per one INSERT STATEMENT. If we use more than that, we will decrease performance.

INSERT INTO tblSample ( Letter, Number, Calendar ) VALUES ( 'D', 4, '2024-02-05' )
                                                        , ( 'E', 5, '2024-02-06' )
                                                        ··· 17 X ··· 
                                                        , ( 'F', 6, '2024-02-07' );    --20 rows at most

We can improve performance by following these 5 advice:

  1. Disable autocommit.  Autocommit will commit the data after each INSERT statement. If we can avoid that, we can speed up things.
  2. We should prepare our statement. That means that our statement will be parsed and cached once. After that, each consecutive INSERT query will use the same statement, just with another parameters.
  3. Use batch processing. Instead of sending a million INSERT statements, we can sent 100 batches of 10.000 INSERT statements. This will reduce communication latency between application and MonetDB, it will reduce memory usage and will minimize locking of a table.
  4. We should disable optimizer. Optimizer can speed up more complex statements, but there is nothing that can be improved for the simple INSERT statement.
  5. We can temporarily disable table constraints like primary key, foreign key or unique. We can restore those constraints after the import.

SQL benchmark

We'll insert one row, with one INSERT INTO statement. Then we'll see if we can noticeably increase the speed by following the tips above.
INSERT INTO tblSample ( Letter, Number, Calendar ) VALUES ( 'D', 4, '2024-02-05' );

First, we will disable our optimizer.
SET sys.optimizer = 'minimal_pipe';

In order to delete constraints, we have to found out their names. We can do that from the system tables.

SELECT T.id, T.name, C.id, C.name, K.key_id, K.key_name
FROM sys.tables  T INNER JOIN sys.columns C      
      ON T.id  = C.table_id
INNER JOIN dependency_columns_on_keys K      
      ON C.id = K.column_id AND C.table_id = K.table_id
WHERE T.name IN ( 'tblsample' );

Our constraints will be temporarily removed.
ALTER TABLE tblSample DROP CONSTRAINT tblsample_number_pkey;
ALTER TABLE tblSample DROP CONSTRAINT tblsample_letter_unique;

We will not remove constraint "NOT NULL", because that constraint will not restrain performance.

START TRANSACTION;Now, we will start a transaction to disable autocommit.

Then we will prepare our statement. Zero is an ID of a prepared statement.
PREPARE INSERT INTO tblSample ( Letter, Number, Calendar ) VALUES ( ?, ?, ? );

After all this, we will again check the timing of our INSERT statement. We are now faster.
EXECUTE 0( 'E', 5, '2024-02-06');

The last thing is that we have to change
everything the way it was.
COMMIT;                                          -- finish transaction
DEALLOCATE PREPARE ALL;                          -- delete prepared statement
SET sys.optimizer = 'default_pipe';              -- turn on optimizer
ALTER TABLE tblSample ADD UNIQUE ( Letter );     -- bring back unique constraint
ALTER TABLE tblSample ADD PRIMARY KEY ( Number );-- bring back primary key constraint

Python Benchmark

We will now try INSERT with python script. In the blog post  "Connect to MonetDB from Python" we have already saw how to use python with MonetDB. Bellow is the script we will use now. This time we will insert 10 rows of data.

import pymonetdb
import time

connection = pymonetdb.connect(username="voc", password="voc", hostname="localhost", database="voc")
cursor = connection.cursor()
insert_statements = [
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('F', 6, '2024-02-06');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('G', 7, '2024-02-07');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('H', 8, '2024-02-08');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('I', 9, '2024-02-09');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('J', 10, '2024-02-10');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('K', 11, '2024-02-11');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('L', 12, '2024-02-12');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('M', 13, '2024-02-13');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('N', 14, '2024-02-14');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('O', 15, '2024-02-15');",
    ]
overall_start_time = time.time()
for i, sql in enumerate(insert_statements, start=1):
       cursor.execute(sql)
connection.commit()
overall_end_time = time.time()
total_time = overall_end_time - overall_start_time
print(f"\n⏳ Total execution time for all inserts: {total_time:.6f} seconds")
cursor.close()
connection.close()
Total execution time is 0.008114 seconds.

Python Benchmark With Accelerations

We can speed up our Python script by using all of the advice mentioned in the text above. This is how our python procedure now looks like:

import pymonetdb
import time

connection = pymonetdb.connect(username="voc", password="voc", hostname="localhost", database="voc")
cursor = connection.cursor()
overall_start_time = time.time()
cursor.execute("SET sys.optimizer = 'default_pipe';")
cursor.execute("ALTER TABLE tblSample DROP CONSTRAINT tblsample_number_pkey;")
cursor.execute("ALTER TABLE tblSample DROP CONSTRAINT tblsample_letter_unique;")
sql = "INSERT INTO tblSample (Letter, Number, Calendar) VALUES (%s, %s, %s);"
data = [    ('P', 16, '2024-02-16'),
            ('Q', 17, '2024-02-17'),
            ('R', 18, '2024-02-18'),
            ('S', 19, '2024-02-19'),
            ('T', 20, '2024-02-20'),
            ('U', 21, '2024-02-21'),
            ('V', 22, '2024-02-22'),
            ('W', 23, '2024-02-23'),
            ('X', 24, '2024-02-24'),
            ('Y', 25, '2024-02-25')
        ]
overall_start_time = time.time()
cursor.executemany(sql, data)
connection.commit()
overall_end_time = time.time()
cursor.execute("DEALLOCATE PREPARE ALL;")
cursor.execute("SET sys.optimizer = 'default_pipe';")
cursor.execute("ALTER TABLE tblSample ADD PRIMARY KEY ( Number )")
cursor.execute("ALTER TABLE tblSample ADD UNIQUE ( Letter );")
connection.commit()
total_time = overall_end_time - overall_start_time
print(f"\n⏳ Total execution time for all inserts: {total_time:.6f} seconds")
cursor.close()
connection.close()

We don't have to explicitly start a new transaction, pymonetdb will do that automatically.

Now our timing is 0.004584 seconds.

Timing

How to Measure Time of the Query Execution

mclient --timer="clock" -u voc -d vocTimer can be "none" (default), "clock" or "performance".

Bellow we can results for these three modes of the timer switch.

mclient --timer="none" -u voc -d voc mclient --timer="clock" -u voc -d voc mclient --timer="performance" -u voc -d voc

When we use "performance", we get 4 results. "SQL" is time used for parsing. "Opt" is time used for optimizing statement. "Run" is time used for running the statement. "Clk" is total time used.

Query History

Data about executed statements is kept in the two tables. Those two tables can be returned with functions " sys.querylog_catalog()" and "sys.querylog_calls()". In order to work with those tables we have to login as administrator.


mclient -u monetdb -d voc
Password is "monetdb".

Data saved in these two tables is persistent between sessions. We can use procedure "sys.querylog_empty()" to clear content from those two tables.

CALL sys.querylog_empty(); –procedures are started with the "call" keyword

In the current session we can start logging with the procedure "querylog_enable()".

CALL querylog_enable();

After that, I will run statement "SELECT * FROM voc.tblSample;" three times.

This will be, now, the content of the "querylog_catalog()" table.
SELECT * FROM querylog_catalog();

"Owner" is the user who started the query at "defined" time.

We can also read from the "querylog_calls()" table.

SELECT * FROM querylog_calls();

In this table we can see the "start" time and the "end" time of our query.  

We can stop logging or queries before the end of the session with:

CALL querylog_disable();

Threshold

Each time we enable logging, our logging tables will become bigger and bigger. This can make the search for a query troublesome. In order to control amount od statements that will be logged, we can use "threshold" argument.

CALL  querylog_enable(5000);

The threshold will limit the logged statements to only those whose execution time is longer than 5000 milliseconds. This allows us to perform profiling, to find the queries that are sucking up our resources the most.

0025 SystemD, Monetdbd, Mserver5: Clarification

Computer Process

When we start one program (e.g., Excel), we'll start one process.

Process occupies part of a CPU, part of memory and can only see files that are opened by that process. In that way, process is like a small virtual machine. Excel process is working inside of such small virtual machine and can only access workbooks that we have opened inside of that Excel (e.g., Book1.xlsx and Book2.xlsx). The purpose of a process is to isolate one program from everything else.

Processes, Daemons and Services

We can divide processes into 4 groups:

  • User process is a process with a friendly interface toward human users ( like Notepad.exe ).
  • Daemon process is a process with a friendly interface toward other programs ( "Windows Time" (w32time) service ).
  • Service is a daemon which provides interface to some essential functionality. For example, Apache server is a service because through it we get our web pages. In the background, Apache server will pull needed data from the MySQL database which is another daemon. In this setup, Apache server is a service daemon, and MySQL database is a normal daemon.
  • Some processes are in between. Microsoft Outlook is a user process dominantly, but it also has an VBA api, so it is partially a daemon.

We can see that the major difference between these processes is stemming from their indented usage. Usually, daemons will start when the OS starts and they will work in the background without users' interaction. On the other side, daemons can also be started manually and users can interact with them.  We have already saw an example of such, manually started, daemon:

monetdbd start /home/fffovde/DBfarm1
monetdbd get all /home/fffovde/DBfarm1

Official Processes

In the "real world" other definitions are used. Something is a daemon or service only if it is officially registered with the operating system. Only processes registered in "Systemd" on Linux systems are called Daemons. In Windows, only processes registered in SCM (Service Control Manager) are called Services.

Systemd and SCM are operating system components used to automatically start official daemons and services in the correct order, control resource usage, access rights, log, and restart failed services. From an OS perspective, this is the only way to distinguish between ordinary and special processes. Since Windows and Linux use such definitions, most people will also use those definitions.

Two Ways to Manually Start Monetdbd Daemon

We will first check that Monetdbd daemon is not working. "pgrep" command is "process grep". We are searching the process by its name. Pgrep should return ID of a monetdbd process, but our process is not active.pgrep monetdbd
 

We will then start our server using systemctl. "Systemctl" is a console program used to control "Systemd". We already saw that systemd is OS component used to control daemons.
sudo systemctl start monetdbd
"Pgrep" command will confirm that our MonetDB server is now working.
pgrep monetdbd

We can also start monetdbd daemon with the "start" command. This will fail because our daemon is already working.
monetdbd start /home/fffovde/DBfarm1

The explanation is that when we open monetdbd with systemctl, two things will happen:

  1. Systemctl will start the daemon, but not "/home/fffovde/Dbfarm1", this server will not start.  This is because systemctl will start the default server. The default server is on the location "/var/monetdb5/dbfarm".
  2. The server "/etc/monetdb5/dbfarm" will occupy the port 50.000. That is why we are getting an error "cannot remove socket files".
We will now just stop the monetdbd daemon using systemctl to release the port 50.000.
sudo systemctl stop monetdbd
Only after that we will be able to start our other server manually.
monetdbd start /home/fffovde/DBfarm1
pgrep monetdbd

Mserver5

When our database is opened, then the process Mserver5 is working. Currently our database is not opened. We can check that with: pgrep mserver5.

There are two ways how to start database. One is to use monetdb console program. Just like systemctl console program is used to control system daemon systemd, in the same way monetdb console program is used to control monetdb daemon monetdbd.
monetdb start voc
pgrep mserver5

The other way is to just call database from mclient. I will first stop the database with "monetdb stop voc", and mserver5 will close.
pgrep mserver5   #we can see that mserver5 is stopped
I'm closing the database just to show you that we can also open the database with the mclient.
mclient -u voc -d voc

We can exit mclient applicaton with "quit". We will only close client application, but the database will remain open. We can check that with pgrep command and we will see that our database is still opened:
pgrep mserver5

When we login to the voc database with mclient, at that moment the database will be opened if it is not already open. When our database is opened, process Mserver5 is also working. This is because Mserver5 process is OUR DATABASE. This process will perform all processing on request of clients for a database voc. Mclient console application is used to send our queries to this Mserver5 process.

Starting the Monetdbd Daemon When the Computer Starts up

We can not start "/home/fffovde/DBfarm1" server automatically when the computer boot up. We can only do that with the default server "/var/monetdb5/dbfarm". First we will stop "/home/fffovde/DBfarm1" server to release the port 50.000:

monetdbd stop /home/fffovde/DBfarm1

Then we will start "/var/monetdb5/dbfarm" server.
systemctl start monetdbd

Next, we will create a new database, and we will open mclient application to test it:
monetdb create newDB
monetdb release newDB #make database accessible
mclient -u monetdb -d newDB  


(password is monetdb for the admin group monetdb)  

Now that we have confirmed that our database is working, we will make our "/var/monetdb5/dbfarm" to a full fledged systemctl controled daemon. This will make our server to open automatically after each system boot. 
systemctl enable monetdbd

Now we can exit everything and we can reboot our computer.

After restarting, our server will open automagically:
pgrep monetdbd
mclient -u monetdb -d newDB

Avoid Entering Credentials Every Time

Inside of our home folder "/home/fffovde", we can create textual file ".monetdb". Inside of this file we can type our password and username:
user=monetdb
password=monetdb
After that we don' have to type our credentials any more. We just have to type this, and we are in.
mclient -d newDB

Running Two Servers at the Same Time

Server "/var/monetdb5/dbfarm" is blocking the port 50.000. While this server is opened, we will not be able to use server "/home/fffovde/Dbfarm1" because that other server is also using the port 50.000.

First, we will change the port of the "DBfarm1" server:
monetdbd set port=50001 /home/fffovde/DBfarm1

Now, we are able to start manually our DBfarm1 server:
monetdbd start /home/fffovde/DBfarm1
For mclient application we have to provide our new port number:
mclient -p50001 -u voc -d voc

Cleaning up

First, I will exit the mclient application with the quit command.
I will close and disable "/var/monetdb5/dbfarm" server. Disabling means that the server will no longer open automatically.
systemctl stop monetdbd
systemctl disable monetdbd

I can now delete newDB database. I delete it just like any other folder:
sudo rm -rf /var/monetdb5/dbfarm/newDB
I will change port number of "/home/fffovde/DBfarm1" server back to 50.000.
monetdbd set port=50000 /home/fffovde/DBfarm1
Then, I will close "/home/fffovde/DBfarm1" server.
monetdbd stop /home/fffovde/DBfarm1


After this I will continue to use "/home/fffovde/DBfarm1" and "voc" database. I will allway use commands:
monetdbd start /home/fffovde/DBfarm1
mclient -u voc -d voc

Summary

We can open the server manually or automatically. To open it manually, we use "monetdbd start". For automatic startup we use "systemctl". We can only automatically open the server "/var/monetdb5/dbfarm". If we use two servers at the same time, they must have different port numbers.

In MonetDB, the databases are quite independent. In Microsoft SQL server things are different. SQL server can have many databases that are sharing many resources. They are sharing logins, tempdb, resource pools, memory settings, collation. Backups, replication, and monitoring tools (like SQL Agent jobs) are often configured at the server level.

Because of that it is correct to say that Mserver5 process is not just a database, it is a whole server. Monetdbd is not really a server. That daemon is only a managing tool for Mserver5 processes. Almost the only thing databases within the same dbfarm folder share is their port number. So, "voc" and "newDB" are monetdb servers. "/home/fffovde/DBfarm1" and "/var/monetdb5/dbfarm" are just folders for those servers.

Monetdb is console application used by a user to interact with montdbd daemon.

0320 Constraints and Altering of Tables in MonetDB

We will start this session as a admin. The only reason for this is because
I want to show that we can move table to some other schema.
Password is "monetdb".
mclient -u monetdb -d voc

Sample Tables

CREATE TABLE tblTurnover ( ID INT PRIMARY KEY,  SubID INT
                        , IDwithNULL INT, Turnover INT DEFAULT 100 NOT NULL );

INSERT INTO tblTurnover ( ID, SubID, IDwithNULL, Turnover )
VALUES ( 1, 1, 1, 101 ), ( 2, 2, 2, 102 ), ( 3, 3, NULL, 103 );
CREATE TABLE tblCost ( ID INT UNIQUE, SubID INT
                     , IDwithNULL INT, Cost INT );  

INSERT INTO tblCost ( ID, SubID, IDwithNULL, Cost )
VALUES (1, 1, 1, 50), (2, 2, NULL, 50), (3, 3, NULL, 50);

Altering Our Tables

                               This is how we can change the name of our table.
ALTER TABLE tblTurnover RENAME TO tblIncome; SELECT * FROM tblIncome;
                               We can change the name of a column.
ALTER TABLE tblIncome RENAME COLUMN Turnover to Income;
SELECT * FROM tblIncome;
                               We can alter our column to accept nulls.
ALTER TABLE tblIncome ALTER COLUMN Income SET NULL;
Beside READ ONLY and READ WRITE, we also have INSERT ONLY.
ALTER TABLE tblCost SET INSERT ONLY;
                     If we try to delete a row from INSERT ONLY table, that will be prohibited.
                               We can change default value for some column:
ALTER TABLE tblIncome ALTER COLUMN Income SET DEFAULT 200;
                               We can delete DEFAULT value for Income column:
ALTER TABLE tblIncome ALTER COLUMN Income DROP DEFAULT;
                               We can add new column in tblCost table.
ALTER TABLE tblCost ADD COLUMN NewColumn BIGINT DEFAULT 88;
SELECT * FROM tblCost;                                

                               If we hadn't provided default value, then this new column would be filled with nulls.
We will delete our new column:
ALTER TABLE tblCost DROP COLUMN NewColumn;
Both of our tables are created in VOC schema. We can move  tblCost table to sys schema
, and we can move it back:
ALTER TABLE tblCost SET SCHEMA voc;
ALTER TABLE tblCost SET SCHEMA sys;
We can check with "\d" command that our table is transferred to some other schema.

Removing Constraints

If want to remove some constraint (primary key, unique constraint, foreign key), we have to know the name of that constraint. In our example we have placed PRIMARY KEY constraint on the tblIncome table, column ID, and UNIQUE constraint on the tblCost table, column ID . We will first read from system table sys.tables to find what are IDs of our tables:

SELECT *
FROM sys.tables
WHERE name IN ('tblcost', 'tblincome');

Now that we know that our tables have IDs 13938 and 13946, we will search for our columns in the table sys.columns. Our columns have IDs 13932 and 13940.

SELECT *
FROM sys.columns
WHERE table_id IN ( 13938, 13946 ) AND name = 'id';

Next step is to find names of our constraints. We will find them in the "dependency_columns_on_keys"  table. We will filter this table by our tables and columns.

SELECT * FROM dependency_columns_on_keys
WHERE table_id IN ( 13938, 13946 )
AND column_id IN ( 13940, 13932 );

We can now create a query that will return the names of the constraints on the specific columns:

SELECT T.id, T.name, C.id, C.name, K.key_id, K.key_name
FROM sys.tables  T INNER JOIN sys.columns C
      ON T.id  = C.table_id
INNER JOIN dependency_columns_on_keys K
      ON C.id = K.column_id AND C.table_id = K.table_id
WHERE T.name IN ( 'tblincome', 'tblcost' ) AND C.name = 'id';
                          Now that we know our constraints names, we can delete them:
ALTER TABLE tblIncome DROP CONSTRAINT tblturnover_id_pkey;
ALTER TABLE tblCost DROP CONSTRAINT tblcost_id_unique;

Constraint on Several columns

When we want to make constraint that encompass several columns, then we have to make a constraint on a table itself. In that case, our CREATE TABLE statement would be like:
CREATE TABLE Tab1 ( Col1 VARCHAR(10), Col2 VARCHAR(10), Col3 VARCHAR(10), PRIMARY KEY ( Col1, Col2 ), UNIQUE ( Col2, Col3 ));

We will now create PRIMARY KEY and UNIQUE constraints on our tables by using ALTER TABLE statement:
ALTER TABLE tblIncome ADD PRIMARY KEY ( ID, SubID );
ALTER TABLE tblCost ADD UNIQUE ( ID, SubID );

    If we now run the statement that will return constraints names, we will get four rows as a result:
SELECT T.id, T.name, C.id, C.name, K.key_id, K.key_name
FROM sys.tables  T INNER JOIN sys.columns C
               ON T.id  = C.table_id
INNER JOIN dependency_columns_on_keys K
               ON C.id = K.column_id AND C.table_id = K.table_id
WHERE T.name IN ( 'tblincome', 'tblcost' ) AND C.name IN ( 'id', 'subid' );

Naming of Constraints

We saw earlier that the name of constraint is created automatically by concatenating a) table name, b) column name, c) constraint type ( tblturnover_id_pkey ). If the constraint encompasses several columns, for example id and subid, then the name will be like tblturnover_id_subid_ pkey. If we want to give our constraint a name then we have to use this syntax in CREATE TABLE statement:

CREATE TABLE Tab2 ( Col1 VARCHAR(10), Col2 VARCHAR(10), Col3 VARCHAR(10), CONSTRAINT PKconstraint PRIMARY KEY ( Col1, Col2 )
                                                                        , CONSTRAINT Uconstraint UNIQUE ( Col2, Col3 ));

We will now give names to our constraints by using ALTER TABLE statement. We can not create another primary key constraint because table can only have one primary key constraint.

ALTER TABLE tblIncome
ADD CONSTRAINT Primarius PRIMARY KEY ( ID, SubID );

We will first delete old primary key constraint, and then we will create a new one, that will have a custom name:

ALTER TABLE
tblIncome DROP CONSTRAINT tblincome_id_subid_pkey;
ALTER TABLE tblIncome
ADD CONSTRAINT Primarius PRIMARY KEY ( ID, SubID );

On the other side, it is possible to have many UNIQUE constraints on one table.
ALTER TABLE tblCost ADD CONSTRAINT Uniquous UNIQUE ( ID, SubID );

Now we have three constraints. We still have old UNIQUE constraint, and we have two new constraints with custom names.
SELECT T.id, T.name, C.id, C.name, K.key_id, K.key_name, K.key_type
FROM sys.tables  T INNER JOIN sys.columns C
               ON T.id  = C.table_id
INNER JOIN dependency_columns_on_keys K
               ON C.id = K.column_id AND C.table_id = K.table_id
WHERE T.name IN ( 'tblincome', 'tblcost' ) AND C.name IN ( 'id', 'subid' );
On the image above, we have a new column "key_type". Zero in this column means that we are dealing with primary key constraint, and 1 is unique constraint.

In the table sys.keytypes we can see that there are 4 types of constraints. CHECK constraint is not supported in MonetDB.

SELECT * FROM sys.key_types;

Foreign Key Constraint

We know that in Excel, "data validation" allows us to restrict that the user can only enter values (2)  ​​from a predefined list (1). Similarly, on the image below, the "key_type" column can only accept values ​​0,1,2,3,4 because those are the only values ​​listed in the main table sis.key_types (3). In relational databases, a list (1) is located within the Primary Key column (4) of the table that performs the function of a "Parent Table". The list (2), where we are limiting what to enter, is in the "Child Table", in the column declared as a "Foreign Key" (5).

This "primary key" > "foreign key" relation is called foreign key constraint, because "primary key" column defines what can be entered into "foreign key" column.
When creating a table, we should use a syntax like this one to create foreign key constraint:

CREATE TABLE Tab3 ( Col1 VARCHAR(10), Col2 VARCHAR(10), Col3 VARCHAR(10), CONSTRAINT PkFk FOREIGN KEY ( Col1, Col2 )
                                                                          REFERENCES Tab2 ( Col1, Col2 ) ); 

We can omit the part ", CONSTRAINT PkFk", but in that case our constraint will get the default name " tab3_col1_col2_fkey".

We can find data about foreign key constraints in this system table. 14009 is Tab2 which has primary key 14277 "pkconstraint". On the other sid is Tab3 with ID 14294, which has foreign key 14292 pkfk.
SELECT * FROM sys.dependency_keys_on_foreignkeys;
In order to find what columns are included in some foreign key constraint, we should read from this system table.
SELECT * FROM describe_foreign_keys;
We can find everything about FK constraints in the table Tab3 with this SELECT query:
SELECT pk_t PkTable, pk_c PkColumn, key_name PkName
     , fk_t FkTable, fk_c FkColumn, fk_name FkName FROM
sys.dependency_keys_on_foreignkeys Fk INNER JOIN describe_foreign_keys DFk
       ON Fk.fk_name = DFk.fk
WHERE fk_t = 'tab3';

We will now add foreign key constraints with ALTER TABLE statement. We will try to link "id" columns. This will fail. Foreign key constraint can only relate to column ( or columns ) that are declared as primary key or as unique columns.

ALTER TABLE tblCost ADD CONSTRAINT FromTblIncomeConstraint FOREIGN KEY ( id ) REFERENCES tblIncome ( id );

We will alter our statement to relate to primary key in the table tblIncome. Current primary key in the table tblIncome is composite key ( id, subid ). We will use that.

ALTER TABLE tblCost ADD CONSTRAINT FromTblIncomeConstraint FOREIGN KEY ( id, subid ) REFERENCES tblIncome ( id, subid );
Foreign key constraint now limits what can be placed into tblCost table.

We can not add "4" into ID column in the table tblCost. That would violate the foreign key constraint.

We can not delete any row from the table tblIncome because each row is referenced in the tblCost table, and that would, again, violate the foreign key constraint.
INSERT INTO tblCost ( ID, SubID, IDwithNull, Cost ) VALUES ( 4, 4, null, 50 );
DELETE FROM tblIncome WHERE ID = 3;

RESTRICT / CASCADE when Deleting Constraints and Columns

We will try to delete ID column in tblCost table.
ALTER TABLE tblCost DROP COLUMN ID;
We will try to delete primary key constraint in the tblIncome table.
ALTER TABLE tblIncome DROP CONSTRAINT Primarius;

We will fail big. This is because the default mode is RESTRICT. By using CASCADE, we would be able to delete this column and this constraint.

START TRANSACTION;
ALTER TABLE tblIncome DROP CONSTRAINT Primarius CASCADE;
ROLLBACK;
If we rollback our transaction, we want really delete the constraint. I don't want to delete it.

NULL Handling         

We can not insert NULL into primary key column. Primary key column doesn't allow nulls.INSERT INTO tblIncome ( ID, SubID, IDwithNULL, Income ) VALUES ( null, 4, null, 104 );
tblIncome   tblCost
Column IDwithNULL in tblIncome table has one field with null. Constraint UNIQUE NULLS NOT DISTINCT can be added to this column, because there will be no duplicate nulls (we only have single null).
ALTER TABLE tblIncome
ADD CONSTRAINT IncomeUniqueNull UNIQUE NULLS NOT DISTINCT ( IDwithNULL );
Column IDwithNULL in tblCost table has two fields with nulls. Suprise is that UNIQUE NULLS NOT DISTINCT constraint can be added to this column.
ALTER TABLE tblCost
ADD CONSTRAINT CostUniqueNull UNIQUE NULLS NOT DISTINCT ( IDwithNULL );
This constraint will only have effect when we try to add new null:
INSERT INTO tblIncome ( id, subid, idwithnull, income )
VALUES ( 4, 4, null, 50 );

The default value for this subclause is NULLS DISTINCT. By default, we will consider all nulls to be distinct.

What is Not Supported in Foreign Key Constraint

For foreign key constraint, match can be only SIMPLE, on update can only be RESTRICT, on delete can only be RESTRICT. Other options are not supported so we will ignore them.

ALTER TABLE Tab3 ADD Constraint PkFk2 FOREIGN KEY ( Col1, Col2 ) REFERENCES Tab2 ( Col1, Col2 )
MATCH SIMPLE ON UPDATE RESTRICT ON DELETE RESTRICT;
The ON UPDATE RESTRICT and ON DELETE RESTRICT subclauses mean that it is not possible to update or delete a key in the parent table if there is a row in the child table that referenced that key.

MATCH SIMPLE means that if a foreign key is a composite key ( it is made of several columns ), and if one of the columns in the foreign key has a null, then the referential integrity will not be checked for that foreign key.  

Error Avoidance

As always, if we use IF NOT EXISTS or IF EXISTS, we will avoid errors. We will always get "operation successful".

CREATE TABLE IF NOT EXISTS employees ( id INT PRIMARY KEY, name VARCHAR(100), department VARCHAR(50), salary DECIMAL(10, 2) );
ALTER TABLE IF EXISTS employees ADD COLUMN hire_date DATE;