0420 Custom Functions in MonetDB part 3

MonetDB / By Bizkapish / April 24, 2025 April 26, 2025

WHILE statement

In procedural SQL, WHILE statement is used for iterative statement where we repeat a block of SQL code as long as a specified condition is true. In the example bellow, we will repeat the loop 5 times, because our argument has 5 letters = length ( 'Surat' ).

Infinite Loop

If we made a mistake, our WHILE statement can create an infinite loop. In that case, solution is to violentely break the process. In mclient we can exit by typing "Ctrl + C" to exit execution.

Nesting WHILE

IF Statement in Procedural SQL

Breaking WHILE Loop

Skip Some Cycles in a WHILE Loop

Functions Overloading

In the same schema it is possible to create several functions with the same name. Trick is that these functions must have different arguments. I will create three simple functions.

CREATE FUNCTION funcOverload()
RETURNS INTEGER BEGIN
RETURN 1;
END; CREATE FUNCTION funcOverload(A INT)
RETURNS INTEGER
BEGIN
RETURN A;
END; CREATE FUNCTION funcOverload(A INT, B INT) RETURNS INTEGER
BEGIN
RETURN A + B;
END;

Creation was successful so we can now test our functions:

Deleting of Overloaded Functions

When we try to delete overloaded function, MonetDB will complain that we have to provide the full signature of a function.
DROP FUNCTION funcOverload;

System Tables

If we want to list all of the functions that are created by us, we have to filter out system functions.

SELECT * FROM sys.functions WHERE system = false;

There will always be one argument named "result". This is the argument that represents the value that the function will return.

0410 Custom Functions in MonetDB part 2

MonetDB / By Bizkapish / April 5, 2025 April 6, 2025

Table Variables

Table variables are temporary in-memory structures. They exist only during the execution of a function. They are mostly used to build up a result set that will be returned from the function. They can be also used as intermediate result storage for procedural logic.

It is also possible to create a table variable based on an existing table, but without inheriting the data. I will create a sample table:

CREATE OR REPLACE FUNCTION funcTableDeclaration()
RETURNS TABLE( N INTEGER )
BEGIN
DECLARE TABLE TableDeclaration AS ( SELECT * FROM tabNum ) WITH DATA;
RETURN SELECT * FROM TableDeclaration;
END; We have created table variable based on the sample table.

SELECT * FROM funcTableDeclaration();
No data inherited.

Limitations of the Table Variables

Declare Scaler Variable

CREATE OR REPLACE FUNCTION funcVariableDeclaration()
RETURNS TABLE ( Var1 INTEGER, VAR2 CHAR )
BEGIN
DECLARE Var1 INTEGER, Var2 CHAR;
SELECT 1, 'A' INTO Var1, Var2;
RETURN SELECT Var1, Var2;
END; Beside declaring two variables at once, we can also set the
values of the both variables by the "SELECT INTO" statement.

Expressions for Values

CREATE OR REPLACE FUNCTION funcVariableDeclaration() RETURNS INTEGER BEGIN DECLARE Var1 INTEGER; SET Var1 = ( SELECT * FROM tabNum ) + 5; RETURN SELECT Var1; END; We can also use this shorter syntax.

CREATE OR REPLACE FUNCTION funcVariableDeclaration()
RETURNS INTEGER
BEGIN
DECLARE Var1 INTEGER, Var2 INTEGER;
SET ( Var1, Var2 ) = ( SELECT Number, Number FROM tabNum ); RETURN Var1 / Var2; END; If the query is returning one row with several columns, we can assign all of those values to several variables in one statement.

Usage of the Scaler Variables

Scaler variables are, also, temporary in-memory structures. We can use them anywhere where we can use constants:

But we can not use scaler variable to give a name to a column of a table variable. We can not use variables instead of the identifiers:

Control Flow With CASE

Search CASE

CREATE OR REPLACE FUNCTION funcCASE( argNumber INTEGER ) RETURNS CHAR BEGIN DECLARE Res CHAR; CASE WHEN argNumber = 1 THEN SET Res = 'A'; WHEN argNumber = 2 THEN SET Res = 'B'; ELSE SET Res = 'C'; END CASE; RETURN Res; END; CASE statement is made of conditions checks ( WHEN subclause ) and
the results ( THEN subclause ) when the condition is met.

The results must be complete statements. In this case, it is a SET statement.
If none of the conditions is met, we will return the statement after the "ELSE" keyword.

Simple CASE

Nested CASE and Default Result of the CASE statement

0400 Custom Functions in MonetDB part 1

MonetDB / By Bizkapish / March 30, 2025 April 1, 2025

Procedural Approach

Procedural SQL

Procedural SQL is a procedural programming language used in databases. It is closely related to SQL. The purpose of procedural SQL is to extend SQL with procedural capabilities. Now we can place the application logic in the database, where it will be close to the data.

SQL is standardized and highly portable between databases. This allows programmers to learn it once, and use it everywhere. On the other hand, procedural SQL, on different servers, varies in scope and detail. The same procedural SQL commands may look the same but behave differently. This is why we have many dialects of procedural SQL, such as PL/SQL in Oracle or T-SQL in SQL server.

Procedural SQL is used when:

We want to perform complex transformations.
We need logic flow control.
We need more flexibility, so our logic must be parameterized.
We are performing validation and error checking.
We want to embed application logic into database to avoid expensive network round trips.

SQL is better than Procedural SQL because:

It is portable and standardized.
It is faster. It uses less memory and computational power.
It is less complex; user doesn't have to be closely familiar with the data in database and algorithms.
It is easier to read. For procedural loops we can use many lines, while SQL can do the same in one simple statement.

The reasons, why SQL is faster than Procedural SQL are:

SQL gives the optimizer more information on which to base optimization.
SQL needs less logging for rollbacks and transactions.
Less locks are taken, when we use SQL.
Set based logic is focus of RDBMS, so they are heavily optimized for it.

User Defined Functions (UDF) in MonetDB

MonetDB has a lot of built-in functions. We can solve most of the problems using string, mathematical, date time functions. For other problems we can create our functions using procedural SQL.

Our function has name, returned data type and returned value. Parentheses are for the arguments of a function.

We can delete function with the statement:
DROP FUNCTION funcReturnTwo();

We can use SELECT statement to acquire a value that will function return:

CREATE OR REPLACE FUNCTION funcReturnTwo() RETURNS INTEGER BEGIN RETURN SELECT * FROM (VALUES (2), (2)) AS t( Col ); END; MonetDB will accept this SELECT statement in the UDF function, but that function will not be usable:
SELECT funcReturnTwo();

Sample Table and Usage of Arguments

Data Manipulation Language (DML) and Data Definition Language (DDL) Statements in UDF

DML statements, like SELECT, INSERT, UPDATE, are working normally inside of the UDF functions.

DDL statements are statements that define and manage the structure of the database ( CREATE, ALTER, DROP… ). If the purpose of the function is to return a value that can be used in a query, is it possible to use DDL statements in the function? Let's try that.

We can see that CREATE statement is working correctly inside of a function. We will now read from the tables "NumText" and "Test":

We can see that there is no table "Test", although function "funcCreate" works normally. This means that not everything will work inside of the function as we expect to. Because function can be called zillion times, it is smart to avoid executing DDL statements inside of the UDF function.

We can conclude that the main purpose of the UDF function is to calculate and return values. It is not their purpose to significantly change state of a database. That is why, DDL statement will not work inside of the functions, or will have a limited effect.

Environment Variables

The question is, whether it is possible to change environment variables, like CURRENT_SCHEMA, inside of the function? Will that change be permanent?

We can see above, that we can permanently change environment variables inside of the functions.

UDF Can Return Table

UDF functions can return tables, not just scalers. Tables returned by UDF can be used as subqueries.

CREATE OR REPLACE FUNCTION funcTable() RETURNS TABLE ( N INTEGER, T VARCHAR(50) )
BEGIN RETURN SELECT * FROM NumText; END; SELECT * FROM funcTable();

Temporary Tables

We will successfully create these two UDF functions.

CREATE OR REPLACE FUNCTION funcLocTable() RETURNS TABLE ( Number INTEGER ) BEGIN RETURN SELECT * FROM locTable;
END; CREATE OR REPLACE FUNCTION funcGlobTable() RETURNS TABLE ( Letter CHAR ) BEGIN RETURN SELECT * FROM globTable; END;

We can see bellow, that it is possible to use temporary tables inside of the custom functions.

BEGIN ATOMIC

CREATE OR REPLACE FUNCTION funcAtomic() RETURNS INTEGER BEGIN ATOMIC RETURN 2; END; If we add the word "ATOMIC" after "BEGIN", we will turn our UDF into one transaction.
"Atomic" means undividable.

Optimization of the UDF

SELECT funcAddTwo( Number ) FROM NumText; We have already seen that a UDF function can be executed for each row of a table.
If our UDF is complicated and the server cannot optimize it, executing the UDF on
a large table may be inefficient. Then we should look for another way to accomplish our task.

0390 Unlogged tables in MonetDB

MonetDB / By Bizkapish / March 24, 2025 March 28, 2025

Write Ahead Log

Unlogged Tables

Unlogged Tables are just like normal database tables, except they are not using WAL.

Normal tables are written like:

Unlogged tables are skipping WAL:

Unlogged tables are almost like normal tables:
1) They are written to disk.
2) After normal shutdown of a system, content of unlogged tables will be preserved.
3) They have transactions, but their transactions are not using WAL. Their transactions exist only in RAM memory.
4) Content of these tables is available to all of the users, just like for normal tables.

The only difference is in the case of the system crush. After the crush, content of normal tables will be restored to a consistent state by using WAL. Unlogged table will be truncated during the recovery. Server can not guarantee consistency of unlogged tables without WAL, so it will delete their content. This is why unlogged tables should be used only for temporary and re-creatable data.

Writing to unlogged tables can be several times faster then writing to normal tables. Without WAL, we can write much faster. We are sacrificing reliability of unlogged tables for better performance.

Unlogged Tables in MonetDB

I will login as administrator.
I will change current schema to "voc". monetdbd start /home/fffovde/DBfarm1 mclient -u monetdb -d voc –password monetdb SET SCHEMA voc;

If we quit our session, and we log in again, we will still be able to use our unlogged table.
quit mclient -u monetdb -d voc SELECT * FROM voc.UnLogTab; Both the data and the table structure will be preserved. Unlogged table can last through several sessions.

Sample Table

What Happens After the Crash?

Difference Between MonetDB and Some Other Servers

ALTER TABLE Tab1 SET UNLOGGED; ALTER TABLE UnLogTab SET LOGGED; Unlike some other databases, MonetDB doesn't have ability to transform unlogged
tables to logged tables and vice versa. These statements will not work.

System Tables

We can find information about unlogged tables in the sys.tables and sys.statistics.

0380 Merge Tables in MonetDB

MonetDB / By Bizkapish / March 16, 2025 March 19, 2025

Why Partitioning?

Benefits and Drawbacks of Partitioning

Queries are faster. Instead of scanning the entire table, we will scan only the necessary partitions. The database is smart enough to discard partitions that do not have a relevant date. This is called "partition pruning". For example, to see sales only for the year 2024, we can query only the 2024 partition and ignore all the others.
Rebuilding indexes, updating statistics, vacuuming is easier for partitions.
Dropping, archiving, backing up, partition swapping, can be done on one part of the table. We can treat the parts of the table separately.
Partitions can be processed in parallel, on different CPU cores. Partitions can be on different storage disks.
Partitions with older/stable data can be compressed and can have multiple indexes. It is the opposite for the most recent data.

Partitioning is only really useful when we have really large tables. Large tables are those with over 100 million rows. The biggest benefit is in maintaining such large tables. It is questionable whether partitioning will improve query speeds. This will only happen if queries exclusively touch some of the partitions and not others. If there is a discrepancy between how users discriminate the data and how we have defined our partitions, we could reduce performance rather than improve it.

Simple Start

First, we will create merge table. It is not possible to query this table until we add some partitions to it.

CREATE MERGE TABLE Merg ( Letter VARCHAR(10), Number INT ); SELECT * FROM Merg;

CREATE TABLE Tab1 ( Letter VARCHAR(10), Number INT ); INSERT INTO Tab1 (Letter, Number) VALUES ('A', 50), ('A', 60); CREATE TABLE Tab2 ( Letter VARCHAR(10), Number INT ); INSERT INTO Tab2 (Letter, Number) VALUES ('B', 150), ('B', 160);

We can see that merge tables are similar to union queries. UNION queries are verbose, while merge table queries are short and simple. UNION queries are more computationally intensive and use more memory. A merge table can effectively use indexes that are set up over individual partition tables.

On the other hand, UNION queries are necessary when the base tables have different structures that require transformation.

System Tables and Removing Partitions

This system table will show us partitions of our merge table. ID of the merge table is 11077.

Problem With Simple Approach

If we try INSERT, UPDATE, DELETE, TRUNCATE on the merge table, we will fail.
UPDATE Merg SET Number = 170 WHERE Letter = 'B';

We will delete our merge table, because we want to create it in a way that will allow INSERT, UPDATE, DELETE, TRUNCATE.
DROP TABLE Merg;

This time we will provide merge table with a rule by which merge table will differentiate between partitions.

CREATE MERGE TABLE Merg ( Letter VARCHAR(10), Number INT ) PARTITION BY VALUES ON ( Letter );

Only when I truthfully declare my partition as defined by the "A" in the "Letter" column, will my partition be accepted.
ALTER TABLE Merg ADD TABLE Tab1 AS PARTITION IN ( 'A' );

But what if I have another table that only has "A" in the "Letter" column. I will create such table, and I will try to add it to the merge table.

CREATE TABLE Tab3 AS ( SELECT * FROM Tab1 ) WITH DATA;
ALTER TABLE Merg ADD TABLE Tab3 AS PARTITION IN ( 'A' );

Now we have a conflict. Definitions of partitions have to be unique. "Tab3" will be rejected.

Partition With Multiple Values in the Letter Column

I will add one more row in the "Tab2" table. After that I will add "Tab2" to the merge table.

Let's Try Modifying "Merg" Table Directly

Let us now try to INSERT a row directly into "Merg" table.

`INSERT INTO Merg ( Letter, Number ) VALUES ( 'Z', 999 );` There is no "Z" partition, so this INSERT will be rejected.
`INSERT INTO Merg ( Letter, Number ) VALUES ( 'A', 70 );` Success! "Merg" now knows where to insert a new row (into "Tab1").

Let's update this new row.
UPDATE Merg SET Number = 71 WHERE Letter = 'A' AND Number = 70;

Let's delete this new row.
DELETE FROM Merg WHERE Letter = 'A' AND Number = 71;

But what if I modify "Tab1" directly. Will that confuse "Merge" table?
UPDATE Tab1 SET Letter = 'Z';
<= As we can see, merge table is protected from the rule violation.

Redefining A Partition

We have new records with the letter "Z", but we have only a few of them. I want to add them to "Tab1" partition. We know that "Tab1" will reject them.

In order to avoid that, I will redefine "Tab1" to accept "Z" record.
ALTER TABLE Merg SET TABLE Tab1 AS PARTITION IN ( 'A', 'Z' );

Let's insert now "Z" record into "Tab1".
INSERT INTO Merg ( Letter, Number ) VALUES ( 'Z', 70 );
"Merg" table will now accept "Z" record.

Other Ways How to Define Partitioning Rule

Partition By Range

CREATE MERGE TABLE MergRange ( Letter VARCHAR(10), Number INT ) PARTITION BY RANGE ON ( Number );

We'll add "Tab1" and "Tab2" to this new "MergRange". Problem is that one table can not be part of several merge tables.

ALTER TABLE MergRange ADD TABLE Tab1 AS PARTITION FROM 1 TO 100;

We will first remove "Tab1" and "Tab2" from the "Merg" table, and then we will add them to the "MergRange" table.

ALTER TABLE Merg DROP TABLE Tab1; ALTER TABLE Merg DROP TABLE Tab2; ALTER TABLE MergRange ADD TABLE Tab1 AS PARTITION FROM 1 TO 100;
ALTER TABLE MergRange ADD TABLE Tab2 AS PARTITION FROM 101 TO 200;

Partition By Value Expression

So far, we have only defined partitions using a single column. Now we will use expression to define partitions. Expression "Letter || CAST( Number AS VARCHAR(10))" says that columns "Number" and "Letter", together, define partition.

CREATE MERGE TABLE MergExpression ( Letter VARCHAR(10), Number INT ) PARTITION BY VALUES USING ( Letter || CAST( Number AS VARCHAR(10) ) );

We will remove "Tab1" and "Tab2" from the merge table "MergRange". Then, we will add them to the "MergExpression" table.

ALTER TABLE MergRange DROP TABLE Tab1; ALTER TABLE MergRange DROP TABLE Tab2; ALTER TABLE MergExpression ADD TABLE Tab1 AS PARTITION IN ( 'A50', 'A60', 'Z70' ); ALTER TABLE MergExpression ADD TABLE Tab2 AS PARTITION IN ( 'B150', 'B160', 'C170' );

Partition by Range Expression

It is also possible to use an expression to calculate the value that will be used to determine range membership.

CREATE MERGE TABLE MergRangeExpression ( Letter VARCHAR(10), Number INT ) PARTITION BY RANGE USING ( Number + char_length( Letter ) );

Again, we will untie our tables from the previous merge table, and then we will add them to the MergRangeExpression table.

ALTER TABLE MergExpression DROP TABLE Tab1;
ALTER TABLE MergExpression DROP TABLE Tab2; ALTER TABLE MergRangeExpression ADD TABLE Tab1 AS PARTITION FROM 1 TO 100;
ALTER TABLE MergRangeExpression ADD TABLE Tab2 AS PARTITION FROM 101 TO 200;

Partition By NULLS

We have to declare what partition will have nulls. Obviously we have to place all the NULLS into only one partition.

ALTER TABLE Merg ADD TABLE Tab1 AS PARTITION IN ( 'A' ) WITH NULL VALUES; In this case all nulls would belong to partition "Tab1".

ALTER TABLE Merg ADD TABLE Tab2 AS PARTITION FROM 1 TO 9 WITH NULL VALUES; All nulls belong to partition "Tab2".

ALTER TABLE Merg ADD TABLE Tab3 AS PARTITION FOR NULL VALUES; In this case, all nulls belong to partition "Tab3".

PARTITION System Tables

When we define partitioning rule (when we use PARTITION clause), that partition rule will be register in these system tables.

System table "sys.range_partitions" is used when partitioning is made by the ranges ( 1-100, 101-200 ).

Merge Table Based on Another Table

It is possible to give merge table definition from some other table. WITH NO DATA is mandatory.
CREATE MERGE TABLE MergAS ( Letter, Number ) AS ( SELECT * FROM Tab1 ) WITH NO DATA;