MonetDB

0440 Procedures in MonetDB

Procedures

A SQL Stored Procedure is a named set of SQL statements that are frequently executed together. When we want to execute those statements, we only need to specify the procedure name and the argument values.

Procedures are very similar to functions, but they differ in purpose and usage.

1) The purpose of functions is to perform calculations and return values. The purpose of procedures is to change database objects. This means that DML statements should be used within procedures, not within functions.

2) Functions are used in statements. They can be used in SELECT, WHERE, GROUP BY, and HAVING. Procedures are self-contained and independent from other statements. They are executed with the "CALL" statement.

DDL Statements in Procedures

CREATE OR REPLACE PROCEDURE procDDL()
BEGIN
    CREATE TABLE Tab1 ( Number INT );
    ALTER TABLE Tab1 ADD COLUMN Letter CHAR;
END;
We can not use DDL statements within a procedure.

CREATE OR REPLACE PROCEDURE procDDL() BEGIN
    CREATE TABLE Tab1 ( Number INT );
END;
It is possible to use "CREATE TABLE" statement
inside of procedure.

But when we try to read from our new table (after we called a procedure), we get nothing:
SELECT * FROM Tab1;

I will now try to create a table inside of procedure, but this time I will specify schema. We will get an interesting message.

CREATE OR REPLACE PROCEDURE procDDL()
BEGIN
    CREATE TABLE voc.Tab1 ( Number INT );
END;
It seems that CREATE TABLE is the same as DECLARE TABLE. We were not using DDL statement, we were just definining a table variable.

Now we know that DDL statements can not be used in Functions and Procedures at all.

Sample Table

CREATE TABLE procTab ( Number INT );
SELECT * FROM procTab;

DML Statements in Procedures

CREATE OR REPLACE PROCEDURE procDML()
BEGIN
    INSERT INTO procTab ( Number ) VALUES ( 1 );
END;
CALL procDML();
  --we call the procedure
SELECT * FROM procTab;

DML statements will work normally inside of procedures.

CREATE OR REPLACE PROCEDURE procDML()
BEGIN
    SELECT * FROM Tab1;
END;
SELECT statement is not allowed in procedures. Procedures don't return values.

Variables and Arguments in Procedures

CREATE OR REPLACE PROCEDURE procVar()
BEGIN
    DECLARE Var1 INTEGER;
    SET Var1 = 2;
    UPDATE procTab SET Number = Var1 WHERE Number = 1;

END;
We can use variables in procedures.  
CALL procVar();
SELECT * FROM procTab;

CREATE PROCEDURE procArg(Arg1 INT)
BEGIN
    UPDATE procTab SET Number = Arg1 WHERE Number = 2;
END;
Procedures can accept arguments.

CALL procArg(3); SELECT * FROM procTab;

CREATE PROCEDURE procArg(Arg1 INT, Arg2 INT)
BEGIN
    UPDATE procTab SET Number = Arg1 + Arg2 WHERE Number = 3;
END;
We can overload procedures, just like functions.
CALL procArg( 2, 2 );
SELECT * FROM procTab;

When deleting overloaded procedure we have to use the full signature.
DROP PROCEDURE procArg( INT );
DROP PROCEDURE procArg( INT, INT );
DROP ALL PROCEDURE procARG;If we use ALL keyword, then we can delete all of the overloaded functions that share the same name, at once.

Transactions in Procedures

CREATE OR REPLACE PROCEDURE procTrans()
BEGIN
    START TRANSACTION;
    UPDATE procTab SET Number = 5 WHERE Number = 4;         COMMIT;
END;
Because we can not use DDL statemens inside of procedures, then it is not possible to use transaction statements.

CREATE OR REPLACE PROCEDURE procTrans()
BEGIN ATOMIC
    UPDATE procTab SET Number = 5 WHERE Number = 4;
END;
Just like with functions, we have to use ATOMIC keyword
after BEGIN to make procedure one big transaction.

Flow Controls in Procedures

Inside procedures we can use
flow control statements like
CASE, WHILE and IF.  
CREATE OR REPLACE PROCEDURE procCASE(Arg1 CHAR)
BEGIN ATOMIC
     DECLARE Res INT;
     CASE
          WHEN Arg1 = 'A' THEN SET Res = 6;
          ELSE SET Res = 7;
     END CASE;
     UPDATE procTab SET Number = Res;
END;
CALL procCASE( 'A' );
SELECT * FROM procTab;


System Tables

We can read about procedures from sys.functions system table. Procedures are of the type "2". User created procedures have "false" in the system column.
SELECT * FROM sys.functions WHERE name = 'proctrans';

Data about procedures argument can be found in the system table sys.args.
SELECT * FROM sys.args WHERE func_id = 15148;

Nesting Functions and Procedures

We will try to call this function from one other function:
CREATE OR REPLACE FUNCTION funcReturnOne()
RETURNS INTEGER BEGIN
     RETURN 1;
END;
This is a function calling a function.
CREATE OR REPLACE FUNCTION funcReturnSum()
RETURNS INTEGER BEGIN
     RETURN funcReturnOne() + 2;
END;
This will work, we can chain functions.

SELECT funcReturnSum();

CREATE OR REPLACE FUNCTION funcProc()
RETURNS INTEGER
BEGIN
     CALL procCASE( 'B' );
     RETURN 1;
END;
This time, we will call procedure from a function. This function will work as expected.
SELECT funcProc();
Procedure "procCASE" will also do its job. It will change the value in "procTab" table.
SELECT * FROM procTab;

It is also possible to nest procedures. This will be inner procedure:
CREATE PROCEDURE procInner()
BEGIN
    UPDATE procTab SET Number = 8 WHERE Number = 7;
END;
This outer procedure will just call inner procedure:
CREATE PROCEDURE procOuter()
BEGIN
    CALL procInner();
END;
Outer procedure will call inner procedure, and inner procedure will update the table.
SELECT * FROM procTab;

Deleting a Procedure

DROP PROCEDURE IF EXISTS procDDL();If we use "IF EXISTS", then deleting of a procedure will never return an error.
If we try to delete not existing procedure, there will be no error, if we use "IF EXISTS".
DROP PROCEDURE IF EXISTS procZZZ();

Function funcProc() is depentand on the procedure "procCASE". That means that we can not delete "procCASE" procedure.
DROP PROCEDURE procCASE CASCADE;We can use CASCADE keyword to delete procedure and all of the depending objects. Now our procedure is deleted.

0430 Python UDFs in MonetDB

Prerequisites for Python UDFs

In order to create Python UDFs, we need to fulfill two prerequisites:
– We have to install the Monetdb-Python3 package. This package is used by MonetDB to communicate with python.
– We have to enable python in each database where we want to create python UDFs (monetdb set embedpy3=true).

How to provide these prerequisites is already explained in one of my previous posts ( 0360 Loader Functions In Monetdb ). Please see this blog post to learn how to enable Python support. Alternatively, you can watch video about loader functions on youtube ( https://youtu.be/2WHb41dzh_A ).

Presence of NumPy and Pandas Packages

In examples we will use NumPy and Pandas packages. You can type "pip show numpy pandas" command to check their presence.

We can install/uninstall/upgrade these packages with commands:

pip install numpy pandaspip uninstall numpy pandaspip install --upgrade numpy pandas

Be careful when doing this. Linux distributions use python internally, so we should avoid tampering with these modules unless necessary. Note that Pandas depends on NumPy.

Sample Table

CREATE TABLE pyTab ( Letter CHAR, Sign CHAR, Number INTEGER );

INSERT INTO pyTab VALUES ( 'A', '+', 1 ), ( 'A', '-', 2 ), ( 'A', '+', 3 ), ( 'B', null, 4 );


SELECT * FROM pyTab;

Scalar Python UDF

CREATE OR REPLACE FUNCTION pyConcat(val1 CHAR, val2 CHAR)
RETURNS CHAR(2)
LANGUAGE PYTHON {
   return val1 + val2
};
This is simple function that will accept two arguments and will return their concatenation. We can use this function like this:

SELECT letter, sign, pyConcat( letter, sign ) FROM pyTab;

In fourth row, if "sign" is null,
then the result will be null.

We can improve our function by transforming
null values of the arguments into empty strings.  =>
CREATE OR REPLACE FUNCTION pyConcat(val1 CHAR, val2 CHAR)
RETURNS VARCHAR(2)
LANGUAGE PYTHON {
   if isinstance(val1, numpy.ma.MaskedArray):
      val1 = val1.filled('')
   if isinstance(val2, numpy.ma.MaskedArray):
      val2 = val2.filled('')
   return val1 + val2
};

Now, we have "B" in the fourth row. This last example showed us what is the nature of arguments in python UDFs. Arguments are NumPy arrays.

import numpy as np
a = np.array([1, 2, 3, 4])
print("Regular array:", a)           # Regular array: [1 2 3 4]

If we have null in some column, then instead of NumPy array, we will get masked array.  
import numpy.ma as ma
b = ma.array([1, -9999, 3, 4], mask=[False, True, False, False])
print("Sum of Masked array:", b.sum())   # Sum of Masked array: 8
A masked array is a combination of a standard NumPy array and a mask. A mask is used to hide invalid or missing values. After we hide the bad values, we can calculate the sum or average of the masked array without the influence of the bad values.

In our example, we have used "isinstance"  function to examine if we have NumPy array or masked array. For masked array we have replaced bad values with empty string.   if isinstance(val2, numpy.ma.MaskedArray):
      val2 = val2.filled('')

If the value in the fourth row is replaced with an empty string, then the null will now affect final result.

The data type of arguments in Python is directly inferred from the SQL data types, according to this mapping.

BOOLEANnumpy.int8||INTEGERnumpy.int32||FLOATnumpy.float64
TINYINTnumpy.int8||BIGINTnumpy.int64||HUGEINT numpy.float64
SMALLINTnumpy.int16||REALnumpy.float32||STRINGnumpy.object

Returned Value

CREATE OR REPLACE FUNCTION pyConcat(val1 CHAR, val2 CHAR)
RETURNS VARCHAR(2)
LANGUAGE PYTHON {
   return(numpy.array(["A+","A-","A+","ZZ"]))
};
Returned value is also NumPy array. I will create one function that will return the result as a constant.  Notice that I have place "ZZ" as the last element.


This function will work just fine. We'll get our result.

SELECT letter, sign, pyConcat( letter, sign ) FROM pyTab;


CREATE OR REPLACE FUNCTION pyConcat(val1 CHAR, val2 CHAR)
RETURNS VARCHAR(2)
LANGUAGE PYTHON {
   return( 3 )
};
This is what will happen if we return scaler.

CREATE OR REPLACE FUNCTION pyConcat(val1 CHAR, val2 CHAR)
RETURNS VARCHAR(2)
LANGUAGE PYTHON {
   return( ( 1, 2 ) )
};
If we return a tuple, we will get this message.

Let's try to return list, dictionary or pandas data frame. We'll call our functions with "SELECT letter, sign, Ret() FROM pyTab;".

CREATE OR REPLACE FUNCTION Ret() RETURNS CHAR
LANGUAGE PYTHON {

   return( ['A','B','C','D'] )
};
CREATE OR REPLACE FUNCTION Ret() RETURNS CHAR
LANGUAGE PYTHON {

   return( {"result":['A','B'] } )
};
CREATE OR REPLACE FUNCTION Ret()
RETURNS INTEGER
LANGUAGE PYTHON {
     import pandas as pd # must import pandas      return(pd.DataFrame({'result':[1,2,3,4]}))
};

Python Traps

CREATE OR REPLACE FUNCTION funcCase(Letter CHAR)
RETURNS CHAR
LANGUAGE PYTHON {
    return( Letter )
};
If we try to call this function, we'll get an error.
SELECT funcCase('A');

Because python is case sensitive and SQL is not, names of arguments will be turned into lower letters in python script.
Instead of "return( Letter )", we have to type "return( letter )". Arguments inside of python script have to be in lower letters.After the change, this function will work.
SELECT funcCase('A');

Python is, of course, sensitive to indentation.
Indentation must be consistent.
We'll get an error if it is not.
CREATE OR REPLACE FUNCTION funcIndent()
RETURNS CHAR
LANGUAGE PYTHON {
      a = 3
   return a
};
SELECT funcIndent();  

Creating a Table with Python UDF

CREATE FUNCTION funcTab()
RETURNS TABLE (Letter CHAR, Number INT)
LANGUAGE PYTHON {
   result = dict()
   result['letter'] = ['A', 'B']
   result['number'] = [1, 33]
   return result
};
Python dictionary can be used to create table with python.
SELECT * FROM funcTab();

Use Python Function to Filter Data

CREATE OR REPLACE FUNCTION funcBoolean(Number INT)
RETURNS BOOLEAN
LANGUAGE PYTHON
{ return number % 2 };
If python function is returning TRUE and FALSE, we can use it as a filter.
SELECT Number, funcBoolean( Number )
FROM pyTab
WHERE funcBoolean( Number );  

Aggregate UDFs

We can group table values with this SQL statement:
SELECT Letter, SUM( Number )
FROM pyTab

GROUP BY Letter;
SQL statement is using built-in SQL function to aggregate values. We can do the same with python UDF. Let's make UDF function pySUM( Val ).
SELECT Letter, pySUM( Number )
FROM pyTab

GROUP BY Letter;

Aggregate function argument is a column to aggregate. We will use Number column. This column will be NumPy array.
np.array([1, 2, 3, 4])

Beside this, aggregate function needs to know what row belongs to what group.
Groups will also be presented with a NumPy array. If we are grouping only by Letter column, our array will show that we only have 2 groups (0,1). If we group by Letter and Sign columns, then array will have 3 groups (0,1,2). This array is always one dimensional and always have the name "aggr_group".
np.array([0, 0, 0, 1])

np.array([0, 1, 0, 2])

So, inputs for our function are "val = np.array([1, 2, 3, 4])" and "aggr_group = np.array([0, 0, 0, 1])". On the left side bellow, we have our function. On the right side we can see interim results and pseudo code.

  
CREATE OR REPLACE AGGREGATE pySUM(val INTEGER)
RETURNS INTEGER LANGUAGE PYTHON {

   unique = numpy.unique(aggr_group)
   x = numpy.zeros(shape=(unique.size))
   for i in range(0, unique.size):
      x[i] = numpy.sum(val[aggr_group==unique[i]])
   return(x)
};
val = np.array([1, 2, 3, 4]) #we start with val and aggr_group
aggr_group = np.array([0, 0, 0, 1])

 
unique = np.array([0, 1]) #we remove duplicates
x = np.array([0, 0]) #result array, but filled with zeros
for i from 0 to 1 #for each group

   x[0] = sum([1, 2, 3, 4] where [0, 0, 0, 1] = 0 ) = 6 #for A
   x[1] = sum([1, 2, 3, 4] where [0, 0, 0, 1] = 1 ) = 4 #for B
return np.array( 6, 4 )

We will run our statement:
SELECT Letter, pySUM( Number )
FROM pyTab
GROUP BY Letter;

If we try to apply our function without grouping, we'll get an error.
SELECT pySUM( Number )
FROM pyTab;
If there are no groups, then "aggr_group" is not defined.

CREATE OR REPLACE AGGREGATE pySUM(val INTEGER)
RETURNS INTEGER LANGUAGE PYTHON {
    try:
        unique = numpy.unique(aggr_group)
        x = numpy.zeros(shape=(unique.size))
        for i in range(0, unique.size):
            x[i] = numpy.sum(val[aggr_group==unique[i]])
    except NameError:
        x = numpy.sum(val) # aggregate on all data
    return(x) };
We have to catch the error above with "try".

We can provide alternative result with "except NameError", that will not use "aggr_group". We will now aggregate the whole column.

SELECT pySUM( Number ) FROM pyTab;

The result without grouping is 10.

System Tables

In system table "sys.function_types", we can see that aggregate functions are type 3, and scalar functions are type 1. Functions that return table are type 5.

SELECT * FROM sys.function_types;

In system table "sys.function_languages", we can see what programming languages can we use to create functions.

SELECT * FROM sys.function_languages;

We can use language and function type to search for our Python functions in system table "sys.functions".

SELECT * FROM sys.functions WHERE Name = 'ret';

0420 Custom Functions in MonetDB part 3

WHILE statement

In procedural SQL, WHILE statement is used for iterative statement where we repeat a block of SQL code as long as a specified condition is true. In the example bellow, we will repeat the loop 5 times, because our argument has 5 letters = length ( 'Surat' ).

CREATE OR REPLACE FUNCTION funcWhile( argWord VARCHAR(20) )
RETURNS STRING
BEGIN
    DECLARE i INT; SET i = 1;
    DECLARE Res VARCHAR(40); SET Res = '';
    WHILE i <= length( argWord ) DO
        SET Res = Res || '_' || SUBSTRING( argWord, i, 1 );
        SET i = i + 1;
    END WHILE;
    RETURN TRIM( Res, '_' );
END;
(1)     '' + '_' + Surat = _S
(2)     _S + '_' + Surat = _S_u
(3)     _S_u +
'_' + Surat = _S_u_r
(4)     _S_u_r +
'_' + Surat = _S_u_r_a
(5)     _S_u_r_a + '_' + Surat = _S_u_r_a_t


TRIM( _S_u_r_a_t, '_' ) = S_u_r_a_t

SELECT funcWhile('Surat');  

This example will return the smallest number X for which 2X >= argument.

SELECT funcExponential(7);


CREATE OR REPLACE FUNCTION funcExponential( argNumber INTEGER )
RETURNS INTEGER
BEGIN
    DECLARE X INT; SET X = 1;
    WHILE POWER( 2, X ) < argNumber DO
        SET X = X + 1;
    END WHILE;
    RETURN X;
END;

CREATE OR REPLACE FUNCTION funcFactorial( argNumber INTEGER )
RETURNS INTEGER
BEGIN
    DECLARE Res INTEGER; SET Res = 1;
    WHILE 1 < argNumber DO
        SET Res = Res * argNumber;
        SET argNumber = argNumber - 1;
    END WHILE;
    RETURN Res;
END;
This is an example where we are calculating factorial of an argument.

SELECT funcWhile(4);


Infinite Loop

If we made a mistake, our WHILE statement can create an infinite loop. In that case, solution is to violentely break the process. In mclient we can exit by typing "Ctrl + C" to exit execution.

CREATE OR REPLACE FUNCTION funcNeverEnd( argNumber INTEGER )
RETURNS INTEGER
BEGIN
    WHILE 1 = 1 DO
        SET argNumber = argNumber + 1;
    END WHILE;
    RETURN argNumber;
END;
SELECT funcNeverEnd(1);

Nesting WHILE

It is possible to nest WHILE statement inside of another WHILE statement.

SELECT * FROM funcNested();

CREATE OR REPLACE FUNCTION funcNested()
RETURNS TABLE( Res INT )
BEGIN
    DECLARE i INT; SET i = 1;
    DECLARE j INT;
    DECLARE TABLE tabRes ( Res INT );
    WHILE i <= 2 DO
        SET j = 1;
        WHILE j <= 2 DO
            INSERT INTO tabRes VALUES ( i * 10 + j );
            SET j = j + 1;
        END WHILE;

        SET i = i + 1;
    END WHILE;

    RETURN tabRes;
END;

IF Statement in Procedural SQL

For conditional logic we can use IF statement.
CREATE OR REPLACE FUNCTION funcIF()
RETURNS INTEGER
BEGIN
    IF ( 1 = 2 )
    THEN RETURN 1;
    ELSE RETURN 2;
    END IF;
END;
Because 1 <> 2, the returned value is 2.

SELECT funcIF();

It is also possible to have several conditions inside of the IF statement.
CREATE OR REPLACE FUNCTION funcIF()
RETURNS INTEGER
BEGIN
    IF ( 1 = 2 ) THEN RETURN 1;
    ELSEIF ( 1 = 2 ) THEN RETURN 2;
    ELSEIF ( 1 = 1 ) THEN RETURN 3;
    ELSE RETURN 3;
    END IF;
END;
One of these conditions is TRUE, so the final result will be 3.

SELECT funcIF();

Breaking WHILE Loop

MonetDB doesn't support BREAK keyword in a WHILE loop. We have to use flag variable to mark when we want to exit the loop.

We will set our flag variable to one value, and we will examine whether it has that value at the start of each cycle. Inside of the loop we can change the value of a flag variable when conditions are met. That change will make condition of a loop to fail.
SELECT funcBREAK(); 
CREATE OR REPLACE FUNCTION funcBREAK()
RETURNS INTEGER
BEGIN
    DECLARE i INTEGER; SET i = 1;
    DECLARE Flag INTEGER; SET Flag = 0;
    WHILE i <= 10 AND Flag = 0 DO
        SET i = i + 1;
        IF i = 5 THEN
            SET Flag = 1;
        END IF;
    END WHILE;
    RETURN i;
END;

Skip Some Cycles in a WHILE Loop

MonetDB doesn't support CONTINUE keyword in a WHILE loop. If we want to conditionally skip some cycles, we should wrap our statements in the IF function.

In the example to the right, we are calculating total of the numbers between 1 and 10, but we are skipping all the even numbers. Result should be 1+3+5+7+9=25.  
SELECT funcCONTINUE();
CREATE OR REPLACE FUNCTION funcCONTINUE()
RETURNS INT
BEGIN
    DECLARE i INT; SET i = 1;
    DECLARE Res INT; SET Res = 0;
    WHILE i < 10 DO
        IF MOD( i, 2 ) <> 0  THEN
            SET Res = Res + i;
        END IF;
        SET i = i + 1;
    END WHILE;
    RETURN Res;
END;

Functions Overloading

In the same schema it is possible to create several functions with the same name. Trick is that these functions must have different arguments. I will create three simple functions.

CREATE FUNCTION funcOverload()
RETURNS INTEGER BEGIN
    RETURN 1;
END;
CREATE FUNCTION funcOverload(A INT)
RETURNS INTEGER
BEGIN
    RETURN A;
END;
CREATE FUNCTION funcOverload(A INT, B INT) RETURNS INTEGER
BEGIN
    RETURN A + B;
END;

Creation was successful so we can now test our functions:

SELECT funcOverload();
SELECT funcOverload(5);

SELECT funcOverload(20,5);

Deleting of Overloaded Functions

When we try to delete overloaded function, MonetDB will complain that we have to provide the full signature of a function.
DROP FUNCTION funcOverload;

I will delete the version of the function, with one argument, by using full singature.

DROP FUNCTION funcOverload(INT);
Now this version doesn't work.
Other functions still work.

System Tables

System table sys.functions contains all of our functions.  

SELECT * FROM sys.functions WHERE name = 'funcneverend';

If we want to list all of the functions that are created by us, we have to filter out system functions.

SELECT * FROM sys.functions WHERE system = false;
Function "funcNeverEnd" has ID 14331. We can use that fact to read about its arguments.
SELECT * FROM sys.args WHERE func_id = 14331;

There will always be one argument named "result". This is the argument that represents the value that the function will return.

0410 Custom Functions in MonetDB part 2

Table Variables

Table variables are temporary in-memory structures. They exist only during the execution of a function. They are mostly used to build up a result set that will be returned from the function. They can be also used as intermediate result storage for procedural logic.

CREATE FUNCTION funcTableDeclaration()
RETURNS TABLE ( Id INTEGER )
BEGIN
   DECLARE TABLE TableDeclaration ( Id INTEGER );
   INSERT INTO TableDeclaration VALUES ( 1 );        RETURN SELECT * FROM TableDeclaration;
END;
SELECT * FROM funcTableDeclaration();
Tables created with DECLARE TABLE statement are mutable and can be modified by using any of the DML statements like INSERT INTO, UPDATE, DELETE…

It is also possible to create a table variable based on an existing table, but without inheriting the data. I will create a sample table:

CREATE TABLE sys.tabNum ( Number INTEGER );
INSERT INTO tabNum ( Number ) VALUES ( 32 );
SELECT * FROM tabNum;
    
   

CREATE OR REPLACE FUNCTION funcTableDeclaration()
RETURNS TABLE( N INTEGER )
BEGIN
   DECLARE TABLE TableDeclaration AS ( SELECT * FROM tabNum ) WITH DATA;
   RETURN SELECT * FROM TableDeclaration;
END;
We have created table variable based on the sample table.

SELECT * FROM funcTableDeclaration();
No data inherited.

Limitations of the Table Variables

CREATE OR REPLACE FUNCTION funcTableDeclaration()
RETURNS TABLE ( Id INTEGER )
BEGIN
   DECLARE TABLE TableDeclaration ( Id INTEGER PRIMARY KEY );
   INSERT INTO TableDeclaration VALUES ( 1 );
   RETURN SELECT * FROM TableDeclaration;
END;
We can not have constraints on the table variables.  

CREATE OR REPLACE FUNCTION funcTableDeclaration()
RETURNS TABLE( Id INTEGER )
BEGIN    
   DECLARE TABLE TableDeclaration ( Id INTEGER );
   INSERT INTO TableDeclaration VALUES ( 1 );
   ALTER TABLE TableDeclaration RENAME COLUMN Id to Code;
   RETURN SELECT * FROM TableDeclaration;
END;
It is not possible to apply ALTER TABLE on the table variable.  

Declare Scaler Variable

CREATE FUNCTION funcVariableDeclaration()
RETURNS INTEGER
BEGIN
   DECLARE Var1 INTEGER;
   SET Var1 = 77;
   SET Var1 = 88;
   RETURN SELECT Var1;
END;
It is possible to declare and set scaler variable.
Scaler variables are mutable, and we can change their values several times.

SELECT funcVariableDeclaration();

CREATE OR REPLACE FUNCTION funcVariableDeclaration()
RETURNS TABLE ( Var1 INTEGER, VAR2 CHAR )
BEGIN
   DECLARE Var1 INTEGER, Var2 CHAR;
   SET Var1 = 77;
   SET Var2 = 'A';
   RETURN SELECT Var1, Var2;
END;
We can declare two variables in one statement.

SELECT * FROM funcVariableDeclaration();

CREATE OR REPLACE FUNCTION funcVariableDeclaration()
RETURNS TABLE ( Var1 INTEGER, VAR2 CHAR )
   BEGIN
   DECLARE Var1 INTEGER, Var2 CHAR;
   SELECT 1, 'A' INTO Var1, Var2;
   RETURN SELECT Var1, Var2;
END;
Beside declaring two variables at once, we can also set the
values of the both variables by the "SELECT INTO" statement.


Expressions for Values

CREATE OR REPLACE FUNCTION funcVariableDeclaration()
RETURNS INTEGER
BEGIN
   DECLARE Var1 INTEGER;
   SET Var1 = 32 + 5;
   RETURN SELECT Var1;
END;
Result of an expression can be assigned as a variable value.

SELECT funcVariableDeclaration();


CREATE OR REPLACE FUNCTION funcVariableDeclaration()
RETURNS INTEGER
BEGIN    
   DECLARE Var1 INTEGER;
   SET Var1 = ( SELECT * FROM tabNum );
   DECLARE Var2 INTEGER;
   SET Var2 = Var1 + 5;
   RETURN SELECT Var2;
END;
Result of a query can be assigned to a variable.  

SELECT funcVariableDeclaration();

CREATE OR REPLACE FUNCTION funcVariableDeclaration()
RETURNS INTEGER
BEGIN
   DECLARE Var1 INTEGER;
   SET Var1 = ( SELECT * FROM tabNum ) + 5;
   RETURN SELECT Var1;
END;
We can also use this shorter syntax.

CREATE OR REPLACE FUNCTION funcVariableDeclaration()
RETURNS INTEGER
BEGIN    
   DECLARE Var1 INTEGER, Var2 INTEGER;
   SET ( Var1, Var2 ) = ( SELECT Number, Number FROM tabNum );     RETURN Var1 / Var2; END;
If the query is returning one row with several columns, we can assign all of those values to several variables in one statement.

Usage of the Scaler Variables

Scaler variables are, also, temporary in-memory structures. We can use them anywhere where we can use constants:

CREATE OR REPLACE FUNCTION funcVariableDeclaration()
RETURNS INTEGER
BEGIN
   DECLARE Var1 INTEGER;
   SET Var1 = 32;
   RETURN SELECT * FROM tabNum WHERE Number = Var1;
END;
SELECT funcVariableDeclaration();

But we can not use scaler variable to give a name to a column of a table variable. We can not use variables instead of the identifiers:

CREATE OR REPLACE FUNCTION funcTableDeclaration()
RETURNS TABLE ( Id INTEGER )
BEGIN
   DECLARE Var1 CHAR;
   SET Var1 = 'N';
   DECLARE TABLE TableDeclaration ( Var1 INTEGER );    
   RETURN SELECT N FROM TableDeclaration;
END;
SELECT * FROM funcTableDeclaration();  

CREATE OR REPLACE FUNCTION funcVariableDeclaration()
RETURNS INTEGER
BEGIN
   DECLARE Var1 INTEGER;
   RETURN SELECT Var1;
END;
Variable without assigned value will be NULL.

SELECT funcVariableDeclaration();

Control Flow With CASE

In a police lineup, a witness goes from left to right, scanning each person. When the witness recognizes the culprit, he stops scanning.

This is how the CASE control flow structure works. We examine the values ​​one by one until we find a value that matches the description.

Search CASE

CREATE OR REPLACE FUNCTION funcCASE( argNumber INTEGER )
RETURNS CHAR
BEGIN    
   DECLARE Res CHAR;    
   CASE          
      WHEN argNumber = 1 THEN SET Res = 'A';
      WHEN argNumber = 2 THEN SET Res = 'B';
      ELSE SET Res = 'C';    
   END CASE;
   RETURN Res;
END;
CASE statement is made of conditions checks ( WHEN subclause ) and
the results ( THEN subclause ) when the condition is met.

The results must be complete statements. In this case, it is a SET statement.
If none of the conditions is met, we will return the statement after the "ELSE" keyword.


SELECT funcCASE( 1 );

SELECT funcCASE( 2 );

SELECT funcCASE( 99 );

CREATE OR REPLACE FUNCTION funcCASE( argNumber INTEGER )
RETURNS CHAR

BEGIN
   DECLARE Res CHAR;
   CASE
      WHEN argNumber < 3 THEN SET Res = 'A';
      WHEN argNumber < 5 THEN SET Res = 'B';
      ELSE SET Res = 'C';
   END CASE;
   RETURN Res;
END;
If the conditions overlap, the condition closer to the top of the list will win.
For example, if argNumber is "2", then both the conditions "2 < 3" and "2 < 5" will be true.
The final result will be the letter "A", because it is related to the top condition in the list.

SELECT funcCASE( 2 );

 

Simple CASE

CREATE OR REPLACE FUNCTION funcCASE( argNumber INTEGER )
RETURNS CHAR
BEGIN     
   DECLARE Res CHAR;     
   CASE argNumber
      WHEN 1 THEN SET Res = 'A';
      WHEN 2 THEN SET Res = 'B';
      ELSE SET Res = 'D';     
   END CASE;     
   RETURN Res;
END;
This here is a shorthand for writing the conditions "ArgNumber = 1" and "ArgNumber = 2".
This syntax is only useful when we are sure that we will only use equality-based conditions.  

SELECT funcCASE( 2 );
 

On the other hand, when we use "Search CASE" then we can use whatever condition we want.

Nested CASE and Default Result of the CASE statement

CREATE OR REPLACE FUNCTION funcCASE( argNumber INTEGER )
RETURNS CHAR
BEGIN
   DECLARE Res CHAR;
   CASE
      WHEN argNumber = 1 THEN SET Res = 'A';

      CASE
   
      WHEN argNumber = 2 THEN SET Res = 'B';
   
   END CASE;
   END CASE;
   RETURN Res;
END;
We can nest CASE structures, one inside of another.

Notice that this time we don't have ELSE substatement. In that case
every value of the argument, except 1 and 2, will cause the function to return NULL.

SELECT funcCASE( 99 );

0400 Custom Functions in MonetDB part 1

Procedural Approach

Procedural programming languages ​​are based on a series of commands that are executed step by step. Programmer is responsible to explain, not only what should be done, but also how it should be done. Procedural programming is using, variables to store interim results, control flow constructs (IF, CASE, LOOP, WHILE…) to change the value of variables, error handling to catch errors, and everything is organized into blocks of code.

This is the opposite from SQL, which is abstract and declarative language. SQL is great for data querying, filtering, joining, grouping, but it can not use loops, variables and conditions.

Major blocks of code, in procedural languages, are functions and procedures. We can hide complex logic behind each function or procedure and then call that complex logic with simple calls. Functions and procedures can pass arguments between them. All of this allow for modular structure, which provides reusability and better organization of a code.

These blocks of code can be parameterized which gives them much greater reusability than SQL.

Procedural SQL

Procedural SQL is a procedural programming language used in databases. It is closely related to SQL. The purpose of procedural SQL is to extend SQL with procedural capabilities. Now we can place the application logic in the database, where it will be close to the data.

SQL is standardized and highly portable between databases. This allows programmers to learn it once, and use it everywhere. On the other hand, procedural SQL, on different servers, varies in scope and detail. The same procedural SQL commands may look the same but behave differently. This is why we have many dialects of procedural SQL, such as PL/SQL in Oracle or T-SQL in SQL server.

Procedural SQL is used when:

  1. We want to perform complex transformations.
  2. We need logic flow control.
  3. We need more flexibility, so our logic must be parameterized.
  4. We are performing validation and error checking.
  5. We want to embed application logic into database to avoid expensive network round trips.

SQL is better than Procedural SQL because:

  1. It is portable and standardized.
  2. It is faster. It uses less memory and computational power.
  3. It is less complex; user doesn't have to be closely familiar with the data in database and algorithms.
  4. It is easier to read. For procedural loops we can use many lines, while SQL can do the same in one simple statement.

The reasons, why SQL is faster than Procedural SQL are:

  1. SQL gives the optimizer more information on which to base optimization.
  2. SQL needs less logging for rollbacks and transactions.
  3. Less locks are taken, when we use SQL.
  4. Set based logic is focus of RDBMS, so they are heavily optimized for it.

User Defined Functions (UDF) in MonetDB

MonetDB has a lot of built-in functions. We can solve most of the problems using string, mathematical, date time functions. For other problems we can create our functions using procedural SQL.

CREATE OR REPLACE FUNCTION funcReturnTwo()
RETURNS INTEGER
BEGIN
     RETURN 2;
END;
On the left side, we can see an example of the simple UDF function in MonetDB.
This function will always return number 2. We can use this function in SELECT statement:
SELECT funcReturnTwo();

Our function has name, returned data type and returned value. Parentheses are for the arguments of a function.

We can delete function with the statement:
DROP FUNCTION funcReturnTwo();

We can use SELECT statement to acquire a value that will function return:

CREATE OR REPLACE FUNCTION funcReturnTwo() RETURNS INTEGER
BEGIN
     RETURN SELECT 2;
END;
SELECT funcReturnTwo();

But, SELECT statement must return scalar in order to be used in the function. This SELECT statement will return a column:  

SELECT * FROM (VALUES (2), (2)) AS t( Col );

CREATE OR REPLACE FUNCTION funcReturnTwo()
RETURNS INTEGER
BEGIN
     RETURN SELECT * FROM (VALUES (2), (2)) AS t( Col );
END;
MonetDB will accept this SELECT statement in the UDF function, but that function will not be usable:
SELECT funcReturnTwo();

Sample Table and Usage of Arguments

CREATE TABLE NumText ( Number INTEGER, Text VARCHAR(50) );
INSERT INTO NumText VALUES ( 120, 'Mississippi' );
SELECT * FROM NumText;

This time, our function will accept an argument of INTEGER data type.
CREATE OR REPLACE FUNCTION funcAddTwo( argNumber INTEGER ) RETURNS INTEGER
BEGIN
     RETURN argNumber + 2;
END;
SELECT funcAddTwo( Number ) FROM NumText;
   '120 + 2 = 122

Data Manipulation Language (DML) and Data Definition Language (DDL) Statements in UDF

DML statements, like SELECT, INSERT, UPDATE, are working normally inside of the UDF functions.

CREATE OR REPLACE FUNCTION funcUpdate( )
RETURNS INTEGER
BEGIN
     UPDATE NumText SET Number = 130 WHERE Number = 120;
     RETURN SELECT Number FROM NumText;
END;
<= We will first update a table, and then we will read from that table.
SELECT funcUpdate( );

DDL statements are statements that define and manage the structure of the database ( CREATE, ALTER, DROP… ). If the purpose of the function is to return a value that can be used in a query, is it possible to use DDL statements in the function? Let's try that.

CREATE OR REPLACE FUNCTION funcCreate()
RETURNS INTEGER
BEGIN
     CREATE TABLE Test ( Number INTEGER );
     INSERT INTO Test ( Number ) VALUES ( 7 );
     RETURN SELECT Number FROM Test;
END;
Our function created a table and then read from that table:

SELECT funcCreate();

We can see that CREATE statement is working correctly inside of a function. We will now read from the tables "NumText" and "Test":

SELECT * FROM NumText;
  '120 permanently changed to 130
SELECT * FROM Test;

There is no table "Test".

We can see that there is no table "Test", although function "funcCreate" works normally. This means that not everything will work inside of the function as we expect to. Because function can be called zillion times, it is smart to avoid executing DDL statements inside of the UDF function.

I will try to rename column in the NumText table, inside of a function.  
CREATE OR REPLACE FUNCTION funcAlter()
RETURNS INTEGER
BEGIN
      ALTER TABLE NumText RENAME COLUMN Text to Tekst;
      RETURN SELECT Number FROM NumText;
END;
This will fail. So, some DDL statements will not work at all.  

We can conclude that the main purpose of the UDF function is to calculate and return values. It is not their purpose to significantly change state of a database. That is why, DDL statement will not work inside of the functions, or will have a limited effect.

Environment Variables

The question is, whether it is possible to change environment variables, like CURRENT_SCHEMA, inside of the function? Will that change be permanent?

CREATE OR REPLACE FUNCTION funcSchema()
RETURNS VARCHAR(10)
BEGIN
     SET SCHEMA voc;
     RETURN CURRENT_SCHEMA;
END;
SELECT funcSchema();
SELECT CURRENT_SCHEMA;

We can see above, that we can permanently change environment variables inside of the functions.

UDF Can Return Table

UDF functions can return tables, not just scalers. Tables returned by UDF can be used as subqueries.

CREATE OR REPLACE FUNCTION funcTable()
RETURNS TABLE ( N INTEGER, T VARCHAR(50) )

BEGIN
     RETURN SELECT * FROM NumText;
END;
SELECT * FROM funcTable();

We can combine returned table, with normal tables.

SELECT * FROM funcTable() UNION ALL SELECT * FROM NumText;

Temporary Tables

We will create two temporary tables, and then we will check, can we used those tables in the UDF function?
CREATE LOCAL TEMPORARY TABLE locTable ( Number INTEGER ) ON COMMIT PRESERVE ROWS;
INSERT INTO locTable ( Number ) VALUES ( 1 );
SELECT * FROM locTable;
CREATE GLOBAL TEMPORARY TABLE globTable ( Letter CHAR ) ON COMMIT PRESERVE ROWS;
INSERT INTO globTable ( Letter ) VALUES ( 'A' );
SELECT * FROM globTable;

We will successfully create these two UDF functions.

CREATE OR REPLACE FUNCTION funcLocTable()
RETURNS TABLE ( Number INTEGER )
BEGIN
     RETURN SELECT * FROM locTable;

END;
CREATE OR REPLACE FUNCTION funcGlobTable()
RETURNS TABLE ( Letter CHAR )
BEGIN
     RETURN SELECT * FROM globTable;
END;

We can see bellow, that it is possible to use temporary tables inside of the custom functions.

SELECT * FROM funcLocTable();
SELECT * FROM funcGlobTable();

BEGIN ATOMIC

CREATE OR REPLACE FUNCTION funcAtomic() RETURNS INTEGER
BEGIN      
     START TRANSACTION;       

     RETURN 2;      
     COMMIT;
END;
We can not use transaction statements inside of the UDF function. This is a problem because,
inside of the one UDF function we can have several statements. We want all of them to succeed or all of them to fail.

CREATE OR REPLACE FUNCTION funcAtomic()
RETURNS INTEGER
BEGIN ATOMIC      
     RETURN 2;
END;
If we add the word "ATOMIC" after "BEGIN", we will turn our UDF into one transaction.
"Atomic" means undividable.

Optimization of the UDF

SELECT funcAddTwo( Number ) FROM NumText;  We have already seen that a UDF function can be executed for each row of a table.
If our UDF is complicated and the server cannot optimize it, executing the UDF on
a large table may be inefficient. Then we should look for another way to accomplish our task.