0500 Proto_loaders, ODBC and COPY in MonetDB

We will continue using the green and blue databases that we have created in this post => link ( or you can watch the video on the youtube  => link ). This was the post about distributed query processing.monetdbd start /home/sima/monetdb/DBfarmG
mclient -u monetdb -d DatabaseG
   –password monetdb

monetdbd start /home/sima/monetdb/DBfarmB
mclient -u monetdb -d DatabaseB
   –password monetdb

Read From CSV File With a file_loader Function

We have three files and two ".gz" archives.

  • "CSV" file is using commas. It also has commas at end of the rows with data. This is the only file that doesn't have file format extension.
  • "TSV.tsv" file is using tab as delimiter, but it also has double quotes around the strings.
  • "PSV.psv" file is using pipes and has a null in the "Letter" column.
  • Files with file format extension ".gz"are just PSV file compressed.
 

Files must be placed on the server. Returned value of the "file_loader" function is virtual table.We don't specify delimiters, wrappers and data types for the files. They are deduced automatically.We can read from CSV, TSV and PSV files, and also ".gz,.lz4,.bz2 and .xz" files.

file_loader Function Syntax

SELECT * FROM file_loader( '/home/abhishek/sample.csv' );The only argument of our function is the full path toward the file.

SELECT * FROM '/home/abhishek/sample.csv';Shorter syntax is much better. We don't have to type the function name.

Experiments With the CSV File

We can not read from files that don't have file format extension.
SELECT * FROM '/home/sima/Desktop/CSV';

I will add file format extension and then I will try again.

SELECT * FROM '/home/sima/Desktop/CSV.csv';
Our CSV file is now reduced to only one column.
The whole row of data from the CSV file is imported into one column.
This is the consequence of the commas at the end of the row.

I will remove the surplus commas. Now our query works as expected.

Experiments With the TSV File

Import from the TSV file will work fine. The double quotes wrappers around strings will be considered as a part of a string.
SELECT * FROM '/home/sima/Desktop/TSV.tsv';

We can try to trim double quotes, but our column will not be recognized.
SELECT TRIM( Letter, '"' ) FROM '/home/sima/Desktop/TSV.tsv';

Column names are case sensitive, so we have to place column names inside of the double quotes. Only then our query will work.
SELECT TRIM( "Letter", '"' ) FROM '/home/sima/Desktop/TSV.tsv';

Experiments With the PSV File

In PSV file we have a null. This null will be imported, but we don't know whether it is a null or a string "null".
SELECT * FROM '/home/sima/Desktop/PSV.psv';

We will test the null value. We can see that "null" is a string.
SELECT ISNULL( "Letter" ) FROM '/home/sima/Desktop/PSV.psv';

Experiments With the GZ files

We can easily read from the compressed PSV.psv file.
SELECT * FROM '/home/sima/Desktop/PSV.psv.gz';

If we compress our file as a "tar" tape archive format then file loader will not work.
SELECT * FROM '/home/sima/Desktop/PSV.psv.tar.gz';

Conclusion

We can conclude that file_loader function is not as versatile as a COPY INTO function, which is described in this blog post => link.

Read From Remote Database With a proto_loader Function

We have saw that we can login to MonetDB server that is on another computer. We also saw how we can create remote tables. This time we will see how to ad-hoc read tables that are on some other computer/server.

Testing Local MonetDB Server

This syntax will work on the local MonetDB server, too. We can use localhost to read from the local computer. I can run this statement on the green server. We can recall that local database, schema and table names are "DatabaseG", "schemagb" and "dimg".
SELECT ProdID FROM proto_loader( 'monetdb://127.0.0.1:50000/DatabaseG/schemagb/dimg' );

This IP address and port number are default. We can use the link without them.
Notice that the database name must use upper letters, but schema and table name must be with lower letters.
SELECT * FROM proto_loader( 'monetdb:///DatabaseG/schemagb/dimg' );

We can also use the the link that has "mapi:" in front of it. In this case, we have to provide IP and port number.

SELECT * FROM proto_loader( 'mapi:monetdb://192.168.100.145:50000/DatabaseG/schemagb/dimg' );

Testing Remote MonetDB Server

More interesting thing is ability to read tables from the remote server. I will read table from the blue server ( before that please start the blue server ).

This statement will be executed on the green server and will return the whole table from the blue server.
SELECT * FROM proto_loader( 'monetdb://192.168.100.146:50000/DatabaseB/schemagb/factb' );

Syntax with "mapi:" will also work on the remote server.

SELECT * FROM proto_loader( 'mapi:monetdb://192.168.100.146:50000/DatabaseB/schemagb/dimb' );

Creation Of a Remote Table

We can test whether we can create remote table using syntax that starts with "monetdb://". On the blue server I will change current schema and then I will create one table.

monetdbd start /home/sima/monetdb/DBfarmB
mclient -u monetdb -d DatabaseB
 –password monetdb
SET SCHEMA SchemaGB;
CREATE TABLE Test( Number INT );

On the green server, I will set current schema to SchemaGB, and I will try to connect to the Test table.
SET SCHEMA SchemaGB;
CREATE REMOTE TABLE Test( Number INT ) ON 'monetdb://192.168.100.146:50000/DatabaseB';

This will successfully create a remote table.
SELECT * FROM Test;

Connect to Any ODBC Database From MonetDB

ODBC Driver Manager

An ODBC driver manager is a middleware component that acts as an intermediary between applications and ODBC drivers, translating standardized ODBC API calls into driver-specific instructions. It allows applications to connect to different databases without needing to be rewritten for each one.

On the green server, we will install ODBC Driver Manager. First check if you alredy have it installed. Just type "odbcinst" in the shell.

I don't have it, so I will install "unixodbc" driver manager. After that command "odbcinst" will show us that the program is working.
sudo apt install unixodbc

This program will add file "odbc.ini" in the "etc" folder. This file is empty.

/etc/odbc.ini

MonetDB ODBC Driver

We can connect to any ODBC capable server, but we will use this opportunity to see how to connect to MonetDB server. We will use MonetDB ODBC driver to connect to the blue server.This is ODBC driver we need. We install it on the green server.

sudo apt install libmonetdb-client-odbc

This driver will add
"odbcinst.ini" file
in the "/etc" folder.

/etc/odbcinst.ini
The driver itself will be in this folder:

/usr/lib/x86_64-linux-gnu/libMonetODBC.so

We can see a problem. Our file is in "x86_64-linux-gnu" folder. Our "odbcinst.ini" file is directed toward "/lib" folder. We must fix that.  In the shell we run this command to start text editor as sudo, and we fix our link.
sudo gedit /etc/odbcinst.ini

Testing ODBC Driver

First, we will add our credentials to "/etc/odbc.ini" file.

sudo nano /etc/odbc.ini

Don't use "gedit" graphical text editor program. Use nano. Only then it will work.
[DatabaseB]
Description = Connect to Blue Server
Driver = MonetDB Server = 192.168.100.146
Port = 50000
Database = DatabaseB
User = monetdb
Password = monetdb

"isql" is console program that comes with "unixODBC". We can use it to test ODBC connection toward the blue server. We can connect to the blue server if we type this into green server shell:

isql -v DatabaseB

After connection, we can set our schema, and we can query our table:

SET SCHEMA SchemaGB;
SELECT * FROM factb;

proto_loader Function For ODBC

But our goal is to use "proto_loader" function to directly fetch data into MonetDB server on the green computer, from the blue server, with ODBC. For that we will install one more package.sudo apt install monetdb-odbc-loader

Using ODBC loader is still experimental. This functionality is NOT turned on by default. We will now turn it on. First, we exit "DatabaseG".quit
monetdb stop DatabaseG

monetdb get all DatabaseG

We can now read properties of
the DatabaseG. Property of
interest is the last one:

loadmodules
We will change this property like this:
monetdb set loadmodules=odbc_loader DatabaseG
We can always turn it back like this:
monetdb set loadmodules=  DatabaseG

We will now log in to our database. This will automatically start the server. During that, we will automatically load "odbc-loader" module.mclient -u monetdb -d DatabaseG    –password monetdb

Using proto_loader Function For ODBC

This is how we run ODBC query directly from the mclient. In this way we can connect to any ODBC capable server, not just other MonetDB. We just have to prepare ODBC driver for that other server.
SELECT * FROM
proto_loader('odbc:DSN=DatabaseB;QUERY=SELECT * FROM schemagb.factb');

All the magic is on the green server. "proto_loader" function will use "odbc_loader" (1) module to send standardized ODBC message to "unixODBC manager" (2). This manager will use MonetDB ODBC driver (3) to call the Blue server (4).

It is also possible to provide all of the necessary parameters directly inside of the ODBC connection string:
SELECT * FROM proto_loader('odbc:DRIVER=/usr/lib/x86_64-linux-gnu/libMonetODBC.so;SERVER=192.168.100.146;PORT=50000;DATABASE=DatabaseB;UID=monetdb;PWD=monetdb;QUERY=SELECT * FROM schemagb.factb')
There is also a version that is using DSN file. This version is for Windows only.
odbc:FILEDSN=<data source name>;[<ODBC connection parameters>;]QUERY=<SQL query>

Virtual Tables

Virtual tables are tables that don't have data physically stored in MonetDB table. Virtual tables are views, merge tables, remote tables. Tables that we receive through file_loader and proto_loader functions are also virtual tables. We will now see how to transform file_loader and proto_loader virtual tables into more permanent structures.

CREATE TABLE Based on the Loader Function

We can use CREATE TABLE AS to store CSV file into new table:
CREATE TABLE permanentCSV ( Number, Letter ) AS ( SELECT * FROM '/home/sima/Desktop/CSV.csv' );

Next, we will read from the permanentCSV table:
SELECT * FROM permanentCSV;

CREATE TEMPORARY TABLE Based on the Loader Function

CREATE LOCAL TEMPORARY TABLE temporaryFactB ( YearNum, Dates, ProdID, Qty ) AS
(  SELECT * FROM proto_loader('odbc:DSN=DatabaseB;QUERY=SELECT * FROM schemagb.factb') )
WITH DATA ON COMMIT PRESERVE ROWS;
We can also make a temporary table.

We can read from this temporary table.

SELECT * FROM temporaryFactB;

Bulk INSERT Based on the Loader Function

TRUNCATE temporaryFactB;                     We can pull data from any other ODBC capable server into our temporaryFactB table (which is now empty).
INSERT INTO temporaryFactB ( YearNum, Dates, ProdID, Qty )
SELECT * FROM proto_loader('odbc:DRIVER=/usr/lib/x86_64-linux-gnu/libMonetODBC.so;SERVER=192.168.100.146;PORT=50000;DATABASE=DatabaseB;UID=monetdb;PWD=monetdb;QUERY=SELECT * FROM schemagb.factb');

SELECT * FROM temporaryFactB; 

COPY command

We already talked about COPY INTO and COPY FROM statements ( blog1 and blog2; youtube1 and youtube2 ). We will now see some special syntaxes of these commands.

COPY FROM stdin

We will first create one empty table.CREATE TABLE tabStdin( Number INT, Letter CHAR );

sql>COPY INTO tabStdin FROM STDIN;
more>1|a
more>2|b
more>

2 affected rows
sql>
We will then run this statement that copies directly from the mclient. After this statement, the prompt will become "more>". That means that MonetDB is expecting more data. We will type the first row "1|a", then the second row "2|b", and for the third row we will enter nothing. After we press Enter,  we will exit and prompt will become "sql>".

We can now read from our table. Values that we have typed after "more>" prompt are inside of the table. Pipe was used as a delimiter between columns.
SELECT * FROM tabStdin;

sql>COPY 2 RECORDS INTO tabStdin FROM STDIN; more>3|c
more>4|d
2 affected rows
sql>
If we provide the number of
rows
that we will COPY,
then we just have to type Enter twice.

COPY INTO stdout

Statement below doesn't work correctly:
COPY SELECT * FROM tabStdin INTO STDOUT;
According to github, this is a bug:
https://github.com/MonetDB/MonetDB/issues/7028

COPY FROM Csv, With DECIMAL Clause

When importing CSV file, we can specify what
symbols are used for decimal and thousands
separators. Let's say that we have CSV file where
decimal point is star "*", and thousands separator
is underscore "_".
I will create one table that accepts DECIMAL numbers.
CREATE TABLE tabDecimal
( Number DECIMAL, Letter CHAR );

COPY OFFSET 2 INTO tabDecimal
FROM '/home/sima/Desktop/CSV_file'( Number, Letter ) DECIMAL AS '*','_';
With DECIMAL clause we can specify what decimal point and thousands separator, our CSV has.

If we now read from our table, there we will find numbers correctly recognized.In MonetDB, default decimal point is a dot. There is no default sign for thousands separator.

Leave a Comment

Your email address will not be published. Required fields are marked *