Loader functions are UDFs (user defined functions), that are used to insert data from various data sources. Currently, we can only make these functions with python programming language.
Idea is, to be able to read from the different data sources by using the power of python.
Monetdb-Python3 Integration Package
Previously, we have installed MonetDB with two packages. We have installed monetdb5-sql and monetdb-client.
For python, we will need one more package. Monetdb-Python3 is integration package that allows MonetDB to interact with python.
sudo apt install monetdb5-sql monetdb-client
sudo apt install monetdb-python3
Command, sudo apt -a list monetdb-python3, will show us that we have 8 different versions of this package in the repository.
Command sudo apt -a show monetdb-python3, will claim that for MonetDB version 11.51.7, we should install version 11.51.7 of monetdb-python3. We should always match versions if we can.
I have the version 11.51.7 of MonetDB server. monetdb --version
I can install the last version of monetdb-python3: sudo apt install monetdb-python3
Or, I can install specific version of monetdb-python3: sudo apt install monetdb-python3=11.51.7
Enabling Embedded Python
I will first start monetdb daemon:
monetdbd start /home/fffovde/DBfarm1
We have to enable python integration package for each database. By typing monetdb get all voc, we can list properties of the voc database.
We can see that for "embedpy3" setting is NO. We will change that. name prop source value voc name - voc voc type default database voc shared default yes voc nthreads default 4 voc ncopyintothreads default 4 voc optpipe default default_pipe voc readonly default no voc embedr default no voc embedpy3 local no voc embedc default no voc listenaddr default<unset> voc nclients default 64 voc dbextra default<unset> voc memmaxsize default<unset> voc vmmaxsize default<unset> voc raw_strings default<unset> voc loadmodules default <unset>
We will stop the database if it is running. Then we will change the setting, and after that we will run our database again. monetdb stop voc monetdb set embedpy3=true monetdb start voc We have changed embedpy3 property to yes.
Now we can login to our database. I will login as an administrator, although that is not needed, any user has ability to create LOADER function.
This statement will create LOADER function. Columns are defined as python lists. Each list, together with the name of a column, is placed inside of the python dictionary.
We are using function "_emit.emit" do divide our inserts into chunks. In this way we can preserve memory. After inserting the first chunk, (A1,B2), we can delete it from the memory, and we can continue inserting the second chunk (C3,D4).
Instead of the python lists, we can also use NumPy arrays. Instead of [1, 2, 3, 4, 5], we can use np.array( [1, 2, 3, 4, 5] ). NumPy arrays are faster.
LOADER functions are of the type 7, so we can read them as "SELECT * FROM sys.functions WHERE type = 7;". We can also notice that our function belongs to schema 2000 (schema sys), because that is the default schema for administrators (I am logged in as an administrator). Creation of LOADER functions is not limited to administrators, every user can create LOADER function.
Using LOADER Function
We can create a table from our LOADER function. Columns and data types will be deduced automatically. CREATE TABLE myLoaderTable FROM LOADER myloader(); SELECT * FROM myLoaderTable;
It is also possible to add data to an existing table. I will first truncate myLoaderTable and then I will append new data to an existing table. TRUNCATE myLoaderTable; COPY LOADER INTO myLoaderTable FROM myloader();
Using a Parameter in a LOADER function
With python we can pull data from anywhere, from any database or file. Here is an example where we will read data from a JSON file.
{ "Col1": ["A","B","C","D"] , "Col2": [1,2,3,4] }
We have a JSON file, with the name "File.json".
CREATE LOADER json_loader(filenameSTRING) LANGUAGE PYTHON { import json f = open(filename) _emit.emit(json.load(f)) f.close() };
This is how we can create LOADER function, that will read from our JSON file. This time we are using an argument for our function. This argument is of the STRING data type. STRING is an alias for the CLOB data type in MonetDB.
json module is builtin Python3 module.
We can truncate previous results and we can import from the json file.
TRUNCATE myLoaderTable;
COPY LOADER INTO myLoaderTable FROM json_loader('/home/fffovde/Desktop/File.json'); SELECT * FROM myLoaderTable;
Missing Columns
TRUNCATE myLoaderTable;
During the import missing columns will be filled with nulls. CREATE LOADER myloader2() LANGUAGE PYTHON { _emit.emit( { 'Col1': [ "A", "B", "C", "D" ] } ) }; COPY LOADER INTO myLoaderTable FROM myloader2(); SELECT * FROM myLoaderTable;
Delete LOADER function
DROP LOADER FUNCTION sys.myloader2;
We can always delete this function with DROP LOADER FUNCTION statement.