0360 Loader Functions in MonetDB

Loader functions are UDFs (user defined functions), that are used to insert data from various data sources. Currently, we can only make these functions with python programming language.

Idea is, to be able to read from the different data sources by using the power of python.  

Monetdb-Python3 Integration Package

Previously, we have installed MonetDB with two packages. We have installed monetdb5-sql and monetdb-client.

For python, we will need one more package. Monetdb-Python3 is integration package that allows MonetDB to interact with python.
sudo apt install monetdb5-sql monetdb-client


sudo apt install monetdb-python3

Command, sudo apt -a list monetdb-python3, will show us that we have 8 different versions of this package in the repository.
Command sudo apt -a show monetdb-python3, will claim that for MonetDB version 11.51.7, we should install version 11.51.7 of monetdb-python3. We should always match versions if we can.

I have the version 11.51.7 of MonetDB server.
monetdb --version
I can install the last version of monetdb-python3:
sudo apt install monetdb-python3
Or, I can install specific version of monetdb-python3:
sudo apt install monetdb-python3=11.51.7

Enabling Embedded Python

I will first start monetdb daemon:

monetdbd start /home/fffovde/DBfarm1
We have to enable python integration package for each database. By typing monetdb get all voc, we can list properties of the voc database. We can see that for "embedpy3" setting is NO. We will change that.
name   prop               source     value
voc    name               -          voc
voc    type               default    database
voc    shared             default    yes
voc    nthreads           default    4
voc    ncopyintothreads   default    4
voc    optpipe            default    default_pipe
voc    readonly           default    no
voc    embedr             default    no
voc    embedpy3           local      no
voc    embedc             default    no
voc    listenaddr         default    <unset>
voc    nclients           default    64
voc    dbextra            default    <unset>
voc    memmaxsize         default    <unset>
voc    vmmaxsize          default    <unset>
voc    raw_strings        default    <unset>
voc    loadmodules        default    <unset>

We will stop the database if it is running. Then we will change the setting, and after that we will run our database again.
monetdb stop voc
monetdb set embedpy3=true
monetdb start voc

We have changed embedpy3 property to yes.

Now we can login to our database. I will login as an administrator, although that is not needed, any user has ability to create LOADER function.mclient -u monetdb -d voc
Password: monetdb

Python LOADER Function

CREATE LOADER myloader() LANGUAGE PYTHON {
     _emit.emit( { 'Col1': [ "A", "B" ], 'Col2': [ 1, 2 ] } )      _emit.emit( { 'Col1': [ "C", "D" ], 'Col2': [ 3, 4 ] } ) };
This statement will create LOADER function. Columns are defined
as python lists. Each list, together with the name of a column, is
placed inside of the python dictionary.
We are using function "_emit.emit" do divide our inserts into chunks.
In this way we can preserve memory. After inserting the first chunk, (A1,B2), we can delete it from the memory, and we can continue inserting the second chunk (C3,D4).

Instead of the python lists, we can also use NumPy arrays. Instead of [1, 2, 3, 4, 5], we can use np.array( [1, 2, 3, 4, 5] ). NumPy arrays are faster.

LOADER functions are of the type 7, so we can read them as "SELECT * FROM sys.functions WHERE type = 7;". We can also notice that our function belongs to schema 2000 (schema sys), because that is the default schema for administrators (I am logged in as an administrator). Creation of LOADER functions is not limited to administrators, every user can create LOADER function.

Using LOADER Function

We can create a table from our LOADER function. Columns and data types will be deduced automatically.
CREATE TABLE myLoaderTable FROM LOADER myloader();
SELECT * FROM myLoaderTable;

It is also possible to add data to an existing table. I will first truncate myLoaderTable and then I will append new data to an existing table.
TRUNCATE myLoaderTable;
COPY LOADER INTO myLoaderTable FROM myloader();

Using a Parameter in a LOADER function

With python we can pull data from anywhere, from any database or file. Here is an example where we will read data from a JSON file.

{ "Col1": ["A","B","C","D"] , "Col2": [1,2,3,4] } We have a JSON file, with the name "File.json".

CREATE LOADER json_loader(filename STRING) LANGUAGE PYTHON {     
import json
    f = open(filename)
   _emit.emit(json.load(f))
   f.close()
};
This is how we can create LOADER function, that will read from our JSON file. This time we are using an argument for our function. This argument is of the STRING data type. STRING is an alias for the CLOB data type in MonetDB.

json module is builtin Python3 module.

We can truncate previous results and we can import from the json file.  

TRUNCATE myLoaderTable;

COPY LOADER INTO myLoaderTable FROM json_loader('/home/fffovde/Desktop/File.json');
SELECT * FROM myLoaderTable;

Missing Columns

TRUNCATE myLoaderTable;

During the import missing columns will be filled with nulls.
CREATE LOADER myloader2() LANGUAGE PYTHON {      
 
         _emit.emit( { 'Col1': [ "A", "B", "C", "D" ] } )
};
COPY LOADER INTO myLoaderTable FROM myloader2();
SELECT * FROM myLoaderTable;

Delete LOADER function

DROP LOADER FUNCTION sys.myloader2;We can always delete this function with DROP LOADER FUNCTION statement.

Leave a Comment

Your email address will not be published. Required fields are marked *