0400 Custom Functions in MonetDB part 1
Procedural Approach
Procedural programming languages are based on a series of commands that are executed step by step. Programmer is responsible to explain, not only what should be done, but also how it should be done. Procedural programming is using, variables to store interim results, control flow constructs (IF, CASE, LOOP, WHILE…) to change the value of variables, error handling to catch errors, and everything is organized into blocks of code. This is the opposite from SQL, which is abstract and declarative language. SQL is great for data querying, filtering, joining, grouping, but it can not use loops, variables and conditions. | ![]() |
Major blocks of code, in procedural languages, are functions and procedures. We can hide complex logic behind each function or procedure and then call that complex logic with simple calls. Functions and procedures can pass arguments between them. All of this allow for modular structure, which provides reusability and better organization of a code. These blocks of code can be parameterized which gives them much greater reusability than SQL. | ![]() |
Procedural SQL
Procedural SQL is a procedural programming language used in databases. It is closely related to SQL. The purpose of procedural SQL is to extend SQL with procedural capabilities. Now we can place the application logic in the database, where it will be close to the data.
SQL is standardized and highly portable between databases. This allows programmers to learn it once, and use it everywhere. On the other hand, procedural SQL, on different servers, varies in scope and detail. The same procedural SQL commands may look the same but behave differently. This is why we have many dialects of procedural SQL, such as PL/SQL in Oracle or T-SQL in SQL server.
Procedural SQL is used when:
- We want to perform complex transformations.
- We need logic flow control.
- We need more flexibility, so our logic must be parameterized.
- We are performing validation and error checking.
- We want to embed application logic into database to avoid expensive network round trips.
SQL is better than Procedural SQL because:
- It is portable and standardized.
- It is faster. It uses less memory and computational power.
- It is less complex; user doesn't have to be closely familiar with the data in database and algorithms.
- It is easier to read. For procedural loops we can use many lines, while SQL can do the same in one simple statement.
The reasons, why SQL is faster than Procedural SQL are:
- SQL gives the optimizer more information on which to base optimization.
- SQL needs less logging for rollbacks and transactions.
- Less locks are taken, when we use SQL.
- Set based logic is focus of RDBMS, so they are heavily optimized for it.
User Defined Functions (UDF) in MonetDB
MonetDB has a lot of built-in functions. We can solve most of the problems using string, mathematical, date time functions. For other problems we can create our functions using procedural SQL.
CREATE OR REPLACE FUNCTION funcReturnTwo() RETURNS INTEGER | On the left side, we can see an example of the simple UDF function in MonetDB. This function will always return number 2. We can use this function in SELECT statement: SELECT funcReturnTwo | ![]() |
Our function has name, returned data type and returned value. Parentheses are for the arguments of a function.
We can delete function with the statement:DROP FUNCTION funcReturnTwo(); | ![]() |
We can use SELECT statement to acquire a value that will function return:
CREATE OR REPLACE FUNCTION funcReturnTwo() RETURNS INTEGER | SELECT funcReturnTwo(); ![]() |
But, SELECT statement must return scalar in order to be used in the function. This SELECT statement will return a column: SELECT * FROM (VALUES (2), (2)) AS t( Col ); | ![]() |
CREATE OR REPLACE FUNCTION funcReturnTwo() | MonetDB will accept this SELECT statement in the UDF function, but that function will not be usable:SELECT funcReturnTwo(); ![]() |
Sample Table and Usage of Arguments
CREATE TABLE NumText ( Number INTEGER, Text VARCHAR(50) ); | ![]() |
This time, our function will accept an argument of INTEGER data type.CREATE OR REPLACE FUNCTION funcAddTwo( argNumber INTEGER ) RETURNS INTEGER | SELECT funcAddTwo( Number ) FROM NumText; ![]() |
Data Manipulation Language (DML) and Data Definition Language (DDL) Statements in UDF
DML statements, like SELECT, INSERT, UPDATE, are working normally inside of the UDF functions.
CREATE OR REPLACE FUNCTION funcUpdate( ) | <= We will first update a table, and then we will read from that table.SELECT funcUpdate( ); ![]() |
DDL statements are statements that define and manage the structure of the database ( CREATE, ALTER, DROP… ). If the purpose of the function is to return a value that can be used in a query, is it possible to use DDL statements in the function? Let's try that.
CREATE OR REPLACE FUNCTION funcCreate() RETURNS INTEGER BEGIN CREATE TABLE Test ( Number INTEGER ); INSERT INTO Test ( Number ) VALUES ( 7 ); RETURN SELECT Number FROM Test; END; | Our function created a table and then read from that table:SELECT funcCreate(); ![]() |
We can see that CREATE statement is working correctly inside of a function. We will now read from the tables "NumText" and "Test":
SELECT * FROM NumText; ![]() | SELECT * FROM Test; ![]() There is no table "Test". |
We can see that there is no table "Test", although function "funcCreate" works normally. This means that not everything will work inside of the function as we expect to. Because function can be called zillion times, it is smart to avoid executing DDL statements inside of the UDF function.
I will try to rename column in the NumText table, inside of a function. CREATE OR REPLACE FUNCTION funcAlter() | This will fail. So, some DDL statements will not work at all. ![]() |
We can conclude that the main purpose of the UDF function is to calculate and return values. It is not their purpose to significantly change state of a database. That is why, DDL statement will not work inside of the functions, or will have a limited effect.
Environment Variables
The question is, whether it is possible to change environment variables, like CURRENT_SCHEMA
, inside of the function? Will that change be permanent?
CREATE OR REPLACE FUNCTION funcSchema() RETURNS VARCHAR(10) BEGIN SET SCHEMA voc; RETURN CURRENT_SCHEMA; END; | SELECT funcSchema(); ![]() | SELECT CURRENT_SCHEMA; ![]() |
We can see above, that we can permanently change environment variables inside of the functions.
UDF Can Return Table
UDF functions can return tables, not just scalers. Tables returned by UDF can be used as subqueries.
CREATE OR REPLACE FUNCTION funcTable() BEGIN | SELECT * FROM funcTable(); ![]() |
We can combine returned table, with normal tables.SELECT * FROM funcTable() UNION ALL SELECT * FROM NumText; | ![]() |
Temporary Tables
We will create two temporary tables, and then we will check, can we used those tables in the UDF function?
| ![]() |
CREATE GLOBAL TEMPORARY TABLE globTable ( Letter CHAR ) ON COMMIT PRESERVE ROWS; | ![]() |
We will successfully create these two UDF functions.
CREATE OR REPLACE FUNCTION funcLocTable() END; | CREATE OR REPLACE FUNCTION funcGlobTable() |
We can see bellow, that it is possible to use temporary tables inside of the custom functions.
SELECT * FROM funcLocTable(); ![]() | SELECT * FROM funcGlobTable(); ![]() |
BEGIN ATOMIC
CREATE OR REPLACE FUNCTION funcAtomic() RETURNS INTEGER
| We can not use transaction statements inside of the UDF function. This is a problem because, inside of the one UDF function we can have several statements. We want all of them to succeed or all of them to fail. ![]() |
CREATE OR REPLACE FUNCTION funcAtomic() | If we add the word "ATOMIC" after "BEGIN", we will turn our UDF into one transaction. "Atomic" means undividable. |
Optimization of the UDF
SELECT funcAddTwo( Number ) FROM NumText; | We have already seen that a UDF function can be executed for each row of a table. If our UDF is complicated and the server cannot optimize it, executing the UDF on a large table may be inefficient. Then we should look for another way to accomplish our task. |