0290 Transactions in MonetDB

What is Transaction?

We insert 500 rows into the table. Due to hardware/software/power issues, our rows are only partially inserted. We do not know which rows are written and which are not. We are not sure whether rows written are correct or not.

For this problem we use a transaction. A transaction is a guarantee that all rows will be inserted correctly or none of them.        
Let's assume another example. The employee was promoted.
We need to change her job title in our database and increase her salary.
For this we need two SQL statements. We want both statements
to succeed or both to fail. Again, we can use a transaction.
START TRANSACTION;
UPDATE JobsTable SET JobTitle = 'New job' WHERE Employee = 'Her';
UPDATE SalariesTable SET Salary = '+1000' WHERE Employee = 'Her';
COMMIT;

So, transaction is a set of statements that will be completed fully or will have no effect at all. This is how we preserve integrity and consistency of our database.

How Transactions Work in MonetDB?

MonetDB is using Optimistic concurrency control. That means that we do not use locks. Let's assume that we want to change the value 30 to value 60 in our database.

First, we will take a note that current value in the table is 30. →Parallel to that, we will prepare our number 60 for the insertion. We want to prepare all in advanced, so that we can be really fast during insertion.

After that two things can happen:

 If the current value is still 30, then we will lock the table. This lock will be really short and will be fine grained, meaning we will only lock 1 record, in our example. It's almost as if we're not locking anything.

After that, we will commit our change. Because we are really fast during commit (everything is already prepared), we reduce the chance that something bad will happen during that time. This guarantees us that the whole transaction will be completed successfully.
If the value in the table is changed by some other transaction, while we were preparing our insert, then we abort our mission. If this is the case, server will send an error message. After that, the user or application, can decide whether they want to retry their transaction or to do something else.

Error: Transaction failed: A conditional update failed

The purpose of this fail is to avoid conflicts between transactions.

Optimistic concurrency control is great for databases where we have high read, low write workloads. That means our database should not have a lot of
conflicts where two transactions are changing the data. It is great for analytic and web databases, because of speed and scalability.

Autocommit

START TRANSACTION; --no need for this
SELECT 2;
COMMIT;            --no need for this
By default, MonetDB uses a transaction around each individual statement.
No need to manually start and end a transaction.
SELECT 2;   —   <= this is already an individual transaction

Several Statements in One Transaction

To wrap several statements into one transaction, we have to use "START TRANSACTION" and "COMMIT" statements. Now, either both transactions will succeed or both will fail.START TRANSACTION;
SELECT 2;
SELECT 3;
COMMIT;

Why my Transaction Doesn't Work?

We will create one sample table.
CREATE TABLE LetterNumber( Letter CHAR, Number INTEGER );
INSERT INTO LetterNumber( Letter, Number ) VALUES ( 'A', 1 ), ( 'B', 2 ), ( 'C', 3 );

If we start transaction, execute some statement and don't Commit, our change will still be visible. How is this possible without Commit?

START TRANSACTION;
UPDATE LetterNumber
SET Number = 4
WHERE Letter = 'A';
-- no Commit statement
SELECT * FROM
LetterNumber;

If we start a new session,
only then we will see that
our change is not really
committed.

The change is only visible
inside of our original
session, so we can
check the result of our
statement without really
committing it.

READ ONLY and READ WRITE

MonetDB documentation claims that we can make our transaction READ ONLY. This is not true, MonetDB doesn't support read only mode. It does support READ WRITE mode, but that mode is the default, so this subclause is useless.

START TRANSACTION READ ONLY;
START TRANSACTION READ WRITE; is the default, so it is unnecessary.

ISOLATION LEVEL

During the read of one column, someone else can change the data.

If we need 5 seconds to read that column, during that time someone can:
– Insert a new row.
– Update a row.
– Delete a row.  

The question is: Do we want to isolate our query from those changes?
We can isolate our query completely or partially. Possible isolation levels are:
1) READ UNCOMMITTED
2) READ COMMITTED
3) REPEATABLE READ
4) SERIALIZABLE

Default Isolation Level

Default isolation level in MonetDB is READ COMMITTED (not sure 100%). For the current session isolation level can be changed with SET TRANSACTION. This statement will change isolation level, but it will also automatically start a new transaction.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;


SELECT 2;


COMMIT;
<= we change isolation level,
but we also start a new
transaction.
<= this statement is
inside the new transaction.
<= we end this transaction.

SELECT 3;<= this is another transaction (because of autocommit).
This transaction is in the same session so it is using
READ UNCOMMITTED isolation level. So, this isolation
level is now valid for all the transactions in the rest of the session.
Because SET TRANSACTION automatically starts a new transaction, we can not use START TRANSACTION immediately after SET TRANSACTION.  

ISOLATION anomalies

ISOLATION levels are used to save us from different anomalies ( dirty reads, lost updates … ) that can occur in a concurrent environment. A concurrent environment is an environment in which different users execute their queries at the same time.

Database will always fight to eliminate these anomalies. Even if we use the READ UNCOMMITTED isolation level, anomalies will probably not occur.  

READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE isolation levels

READ UNCOMMITTED will not isolate us at all. We will see everything other people are doing.

If we start a READ UNCOMMITTED transaction (1), we will see that that in the third row we have C3 (2).  

In that moment, some other user will start a transaction (3), and he will change the third row to C5 (4).

If we read our table again, inside our transaction, we will see that change (5). We are not isolated at all. Even uncommitted changes by other users will be visible.

This anomaly is called "dirty reads". If we want to avoid this anomaly, we should use READ COMMITED isolation level.      

READ COMMITTED will save us from dirty reads, but it won't save us from "lost updates".

Transaction A starts before Transaction B. Transaction A ends after Transaction B. Both transactions try to update the value 100 to something else. Since transaction A is committed last, the value written to the table will be 115. The value 75 is lost.

The solution is the REPEATABAL READ level. This level, if used by Transaction A, would lock the value 100 after reading it. This would force transaction B to fail, after step X = X – 25.

We don't know in advance which transaction would be the last, so both transactions should use REPEATABLE READ.

There is a similar anomaly as "dirty reads". That anomaly is "Non-Repeatable Reads". The difference is that the other person will not only make the change, they will also commit the change. If we can see the change before a commit, it's a "dirty reads" anomaly, if we only see it after a commit, it's a " Non-Repeatable Reads " anomaly. REPEATABLE READ will save us from this anomaly, too.

The strictest isolation level is "SERIALIZABLE". This level will save us from "phantom reads".

Phantom reads occur when one user start the transaction (1), and then he reads all the rows where gender is female (2). In the meantime, another user will start the transaction (3), he will delete one row where gender is female (4), and he will commit (5).

If the first user tries to read the same rows again (inside of the same transaction), he will see the deletion made by another user (6). The same thing would happen if the rows are inserted, in that case we would see "phantom rows".  

How to assign isolation level in the individual transaction?

We saw how to assign default isolation level with SET TRANSACTION. It is similar for individual transaction:

START TRANSACTION REPEATABLE READ;
SELECT 2;
COMMIT;

Transaction control commands

We can control how our transaction will end. COMMIT keyword means that we want to execute our statement. ROLLBACK keyword means that we will fail our statement.

START TRANSACTION ISOLATION LEVEL READ COMMITTED;
UPDATE LetterNumber SET Number = 8 WHERE Letter = 'C';
ROLLBACK;  --will fail
START TRANSACTION ISOLATION LEVEL READ COMMITTED;
UPDATE LetterNumber SET Number = 5 WHERE Letter = 'B';
COMMIT;          --will execute

We can automatically open a new transaction after COMMIT or ROLLBACK. For that we use AND CHAIN subclause:

START TRANSACTION;
SELECT 2;
ROLLBACK AND CHAIN;  --rollback transaction and start a new one

SELECT 3;
ROLLBACK;            --rollback this second transaction             
START TRANSACTION;
SELECT 2;
COMMIT AND CHAIN;     --commit transaction and start a new one

SELECT 3;
ROLLBACK;            
--rollback this second transaction             

What is a SAVEPOINT?

Savepoint is the same as the checkpoint in video games?

If your player dies before the checkpoint:
He will have to start again from the start:
If your player dies after the checkpoint:
Then he can continue the game from the checkpoint:

Idea of checkpoint is to save your progress, so you don't have to replay everything. In Transactions control, idea of Savepoint is to avoid repeating the whole transaction, if we can repeat only part of it.

How to use SAVEPOINT?

We'll make an example with two savepoints, then we will see what will happen if we apply ROLLBACK, ROLLBACK to savepoint1, or ROLLBACK to savepoint2.

START TRANSACTION;
UPDATE LetterNumber SET Number = 2 WHERE Letter = 'A';
SAVEPOINT SP1;     –the name of a savepoint
UPDATE LetterNumber SET Number = 8 WHERE Letter = 'B';
SAVEPOINT SP2;     –the name of a second savepoint
UPDATE LetterNumber SET Number = 7 WHERE Letter = 'C';
Then, we can use one of these three commands:
ROLLBACK;
– Would rollback the whole transaction.
ROLLBACK TO SAVEPOINT SP2;
– Would rollback to the second savepoint.
ROLLBACK TO SAVEPOINT SP1;
– Would rollback to the first savepoint.

Starting status of our table:
Status before ROLLBACK:
If we use ROLLBACK:
If we go to SAVEPOINT SP2:
If we go to SAVEPOINT SP1:

We can delete savepoints with RELEASE command. After we release a savepoint, we can not go back to that savepoint any more.

Before release:

RELEASE SAVEPOINT SP2;
 
RELEASE SAVEPOINT SP1;
RELEASE SAVEPOINT SP1;
RELEASE SAVEPOINT SP2;

Savepoints are independent. If we release savepoint1, savepoint2 will still be alive and valid.

Synonyms

These commands are synonymous, we can use them interchangeably.

SET TRANSACTION;START TRANSACTION;COMMIT;ROLLBACK;
SET LOCAL TRANSACTION;BEGIN TRANSACTION;COMMIT WORK;ROLLBACK WORK;

0280 Comparison Functions in MonetDB

Comparison Operators

MonetDB supports all of the comparison operators:

SELECT 2 < 5  AS "2<5" , 2 > 5  AS "2>5",        
       2 <= 5 AS "2<=5", 2 >= 5 AS "2>=5",        
       2 = 5  AS "2=5" , 2 <> 5 AS "2<>5";

Comparison operators can also be used to alphabetically compare strings:

SELECT 'aa' < 'bb'   AS "aa<bb" , 'aa' > 'bb'   AS "aa>bb",        
       'aa' <= 'bb'  AS "aa<=bb", 'aa' >= 'bb'  AS "aa>=bb",        
       'aa' = 'bb'   AS "aa=bb" , 'aa' <> 'bb'  AS "aa<>bb";  

Beside comparison operators, we can get the same functionality with comparison functions:

SELECT "<"( 'aa', 'bb' )  AS "aa<bb" , ">"( 2, 5 ) AS "2>5",        
       "<="( 'aa', 'bb' ) AS "aa<=bb", ">="( 2, 5 ) AS "2>=5",        
       "="( 'aa', 'bb' )  AS "aa=bb" , "<>"( 2, 5 ) AS "2<>5";

Null Operators and Functions

Null Operators

IS NULL operator is useful for testing NULL values.  

SELECT null IS NULL, 3 IS NULL;  
We can also test if something is not NULL.

SELECT null IS NOT NULL, 3 IS NOT NULL;  

Null Functions

COALESCE function will return the first argument that is not a NULL. If all of the arguments are null, then the null will be returned.
 
SELECT COALESCE( null, 'bb', 'aa'), COALESCE( null, null, null);
IFNULL function is the same as COALESCE function, but it accepts only 2 arguments. Notice that this function has to be enclosed into curly brackets.  
SELECT { fn IFNULL( 77, 82 ) }, { fn IFNULL( null, 82 ) }, { fn IFNULL( null, null ) };
NULLIF function accepts two arguments. If those two arguments are the same, NULL will be returned. If they are different, the first argument will be returned.  
SELECT NULLIF( 77, 77 ), NULLIF( 'xx', 'yy' );
With ISNULL function, we can test whether some value is NULL or not.  

SELECT ISNULL( null ), ISNULL( 77 );

BEETWEN Operator and Function

Between Operator

With BETWEEN operator, we can test whether some value belongs to some range.  
SELECT 2 BETWEEN 1 AND 3, 2 NOT BETWEEN 1 AND 3;
BETWEEN will work with alphabetical ranges, too.  
SELECT 'bb' BETWEEN 'aa' AND 'cc', 'bb' NOT BETWEEN 'aa' AND 'cc';
The BETWEEN operator is inclusive. The logic behind BETWEEN is 1 >= 1 AND 1 <= 2. Therefore, the first expression below will return TRUE:  
SELECT 1 BETWEEN 1 AND 2, 'bb' BETWEEN 'cc' AND 'aa';  
In the second expression, we should have written " aa AND cc", so we reversed our boundaries. Therefore, we will get FALSE as a result. The smaller value should go first, then the larger one.
If we don't know in advanced which boundary is smaller and which is larger, then we should sort our boundaries before creating our range. BETWEEN operator can help us with that. We just have to enhance BETWEEN operator with the SYMMETRIC keyword.  

SELECT 'bb' BETWEEN SYMMETRIC 'cc' AND 'aa', 'bb' NOT BETWEEN SYMMETRIC 'cc' AND 'aa';  

Between Function

Between is a function, which gives us more control than the BETWEEN operator. This function has 8 arguments. On the MonetDB website, this function is poorly explained. I will explain all of its arguments except the boolean1 argument. I don't know what this argument does.  
"between"(arg_1 any, arg_2 any, arg_3 any, boolean1, boolean2, boolean3, boolean4, boolean5)
In its basic form, this function will check if argument1 is between arguments 2 and 3. Boundaries are not inclusive.  
SELECT "between"( 1, 1, 3, false, false, false, false, false );
We can make this function inclusive if we change the last argument (boolean5) to TRUE.  
SELECT "between"(  1, 1, 3, false, false, false, false, true );
We can also make this function partially inclusive. Argument "boolean2" can turn on inclusivity, but only for the lower boundary.  

SELECT "between"( 1, 1, 3, false, true, false, false, false );
Argument "boolean3" will do the same, but only for upper boundary.  

SELECT "between"( 3, 1, 3, false, false, true, false, false );
If the first argument is "null", the result of the BETWEEN function will be "null". If we change the fourth argument to TRUE, then the result will be FALSE.  
SELECT "between"( null, 1, 3, false, false, false, false, false ),        
       "between"( null, 1, 3, false, false, false, true, false );

Functions LIKE and NOT_LIKE

LIKE function is matching a string with a pattern. If they do match, TRUE will be returned. Wildcard % will replace 0 or many characters. "_" will replace only one character.  
SELECT "like"( 'abc', 'ab%', '#', false),        
       "like"( 'abc', 'a__', '#', false)
If we need a pattern that contains % and _ characters, then we have to escape them. Third argument is used as an escape sign.  
SELECT "like"( 'ab%', 'ab#%', '#', false),        
       "like"( 'abc', 'ABC', '#', true);
 
If we want to make our pattern case insensitive, then we have to change the fourth argument to true.
The opposite function of "like" function is not_like. This example would return FALSE.
SELECT not_like('abc', 'ab%', '#', false);

Function IfThenElse

IfThenElse function will give us ability to conduct ternary logic. If the first argument expression is TRUE, we will return the second argument. Otherwise, we will return the third argument.  

SELECT IfThenElse( 2 > 1, 'Two is larger than 1.', 'Two is smaller than 1.');

0270 String Functions in MonetDB

By placing letter e before string, we can write control characters ( Tab, New line, Carriage return ) with their specifiers. This will work even without e.

select e'A\nB' AS NewLine
     , e'A\u2764' AS Unicode
     , e'A\tB'AS TAB
     , 'A\tB'AS TAB2;  
Control CharacterNameUsage
\nNewlineMoves the cursor to the next line.
\rCarriage returnMoves the cursor to the beginning of the current line.
\tTab spaceInserts a tab space.
\bBackspaceDeletes the character just before the control character.
\fForm feedMoves the cursor to the next page.
\uUnicode characterRepresents a Unicode character using a code number.

Without e, the result is the same ( e'A\tB' = 'A\tB' ). If we want to write literal specifiers, then we have to use prefix R.

select r'A\nB' AS NewLine, r'A\u2764' AS Unicode, R'A\tB'AS TAB;

We can write blob with pairs of hexadecimal characters. Before such blob we should type X or blob.SELECT x'12AB', blob '12AB';

Concatenation

Concatenation can be done with operator || or with concat function. Concat function accepts only 2 arguments.

SELECT 'p' || 'Q' || 'R', concat( 'p', 'Q' );

Number of Characters

Length function returns number of characters. Synonims for length function are char_length and character_length.   MonetDB encode all of the strings with UTF-8. Some of UTF-8 characters use several bytes. If we want to know length of a string in bytes then we should use octet_length function.

SELECT length(R'2€') AS CharNo, octet_length(R'2€') AS ByteNo;

Searching for One String Within Another

In our example, LOCATE looks for the substring 'es' within the second argument. If it can find it, LOCATE will return in what position.
SELECT LOCATE( 'es', 'successlessness' ), LOCATE( 'es', 'successlessness', 8 );
We can give a third argument. In our example, we will not search for 'es' in the first 8 characters.
Synonym for this function is CHARINDEX.

POSITION function will also return position of the substring 'es'.
SELECT POSITION( 'es' IN 'successlessness' );

PATINDEX function is similar. Difference is that this function can use wildcards. It can use '%' for zero or more characters or '_' for one character. It will return position, not of the first letter, but of the last letter.
SELECT PATINDEX( '%eS', 'succeSslessness' ),PATINDEX( '____eS', 'succeSslessness' );
SELECT PATINDEX( 's%c', 'succeSslessness' ),               --0
       PATINDEX( '%s%c', 'succeSslessness' ),              --4
       PATINDEX( '%s%c%', 'succeSslessness' ),             --4
       PATINDEX( 'c_S', 'succeSslessness' ),               --6
       PATINDEX( 'succeSslessness%', 'succeSslessness' ),  --15
       PATINDEX( 'succeSslessness_', 'succeSslessness' )
;   --0
These examples
show the logic behind
PATINDEX.
All of these functions (LOCATE, CHARINDEX, POSITION, PATINDEX) are case sensitive.
SELECT CHARINDEX( 'ES', 'successlessness' );
If they can not find substring, all of them will return zero.

Get the Substring by its Position

SUBSTRING function will return all the letters of the string after some position.
SELECT SUBSTRING( '123456', 3 ), SUBSTRING( '123456', 3, 2 ) ;
With the third argument, we can limit number of characters returned.
If the second argument is too large, we will get an empty string.
SELECT SUBSTRING( '123', 5 ), SUBSTRING( '123', 2, 7 ) ;
If the third argument is too large, it will have no effect.

Instead of SUBSTRING, we can use SUBSTR.

Convert Case

LCASE will transform the string to lower case. UCASE to upper case. Synonyms are LOWER and UPPER.
SELECT LCASE( 'NEW' ), UCASE( 'new' ), QGRAMNORMALIZE ('Prize 50!');
Special function QGRAMNORMALIZE will transform the string to upper case while removing all the
strange characters, only numbers and letters will remain.

Max and Min

We can order words alphabetically ( Apple, Zebra ). Minimum function will return the word closer to "A",
and maximum function will return the word closer to "Z".
SELECT sql_min( 'Apple', 'Zebra' ) AS "MIN", sql_max( 'Apple', 'Zebra' ) AS "MAX";
Synonyms for these functions are LEAST and GREATEST.

Unicode and ASCII

ASCII function will return Unicode number of the first character of the string (ß=>223).
CODE function will return the opposite, for the Unicode number, it will return the character (223=>ß).
ASCIIFY function will transform all of the non-ascii characters to ascii equivalents (=>EUR).
SELECT ASCII( r'ßzebra' ) AS FirstLetterUnicode     
         , CODE( 223 ) AS UnicodeCharacter     
         , ASCIIFY( R'a sign' ) AS AsciiOnly;

Padding of The String

We can pad some string with spaces from the left, or from the right side. Second argument
decides what should be the length of the whole string.
SELECT '=>' || LPAD( 'zz', 5 ) AS LeftPadding             
             , RPAD( 'zz', 5 ) || '<=' AS RightPadding;
If used, third argument will be used instead of the space sign.
SELECT LPAD( 'zz', 7, 'OE' ) AS LeftPadding, RPAD( 'zz', 7, 'OE' ) AS RightPadding;  
If the string is already too long, then it will be truncated from the right side.
SELECT LPAD( '12345', 3 ) AS LeftPadding     
       , RPAD( '12345', 3 ) AS RightPadding;

Trimming The String

We can trim spaces from the start, from the end, or from the both ends of a string.
SELECT 'leftTrim' || LTRIM( '   <=' )
                   , RTRIM( '=>   ' ) || 'righTrim'
                   ,  TRIM( '   A   ' ) AS BothSides;
The third argument can be used to specify the string to trim.
SELECT 'leftTrim' || LTRIM( 'UUU<=', 'UU' )
                         , RTRIM( '=>UUU', 'UU' ) || 'righTrim'
                         ,  TRIM( 'zzAzz', 'z' ) AS BothSides;

String Repetition

With SPACE function we can repeat space character many times. With REPEAT function we can repeat any string many times.  
SELECT '|' || SPACE(5) || '|', REPEAT( 'OE', 5);

Extracting the Start or End of the String

This is how we can get the first or last two characters.  
SELECT LEFT( '1234', 2 ), RIGHT( '1234', 2 );

Guess the Start or the End of the String

This is how we can check if the string starts or ends with some other string.
SELECT startswith( 'ABC', 'AB' ), endswith( 'ABC', 'bC' );
If we want to make comparison case insensitive, then we have to provide TRUE for the third argument.
SELECT startswith( 'ABC', 'ab', true ), endswith( 'ABC', 'bC', true );

Does One String Contains Another

CONTAINS function will check whether second string is contained inside of the first string. The function is case sensitive. We can make it insensitive by adding third argument TRUE.

SELECT CONTAINS( 'remarkable', 'MARK' ), CONTAINS( 'remarkable', 'MARK', true );

Insert and Replace

REPLACE function will search for the second string inside of the first string. It will replace it with the third string.
This function is case sensitive.
SELECT REPLACE( 'repeat', 'peat', 'try' ), REPLACE( 'repeat', 'PEAT', 'try' );

INSERT function will remove characters between positions 3 (2 + 1) and 6 (2 + 4). It will replace them with the last argument. In the second column, it will search between positions 3 (10 – 8 + 1) and 6 again, but this time it is counting from the end.
SELECT INSERT( 'remarkable', 2, 4, 'MARK' ), INSERT( 'remarkable', -8, 4, 'MARK' );
If the second argument, in INSERT function, is too big, then the string will get a suffix.  
SELECT INSERT( 'remarkable', 50, 4, 'MARK' );

SYS.MS_STUFF function is similar to INSERT function. Main difference is that for the second argument we have to provide the exact position. Instead of 2 we have to provide exact 3.  
SELECT sys.ms_stuff( 'remarkable', 3, 4, 'MARK' );
If we make a mistake with the second argument, while using SYS.MS_STUFF function, we will get an empty string as a result.
SELECT sys.ms_stuff( 'remarkable', -3, 1, 'MARK' ), sys.ms_stuff( 'remarkable', 50, 1, 'MARK' );
Interesting thing with the SYS.MS_STUFF function is that the third argument can be negative.
This argument will then duplicate several last characters from the start of the string.
SELECT sys.ms_stuff( 'remarkable', 3, 0, 'MARK' ) AS "0"    --reMARKmarkable     
          , sys.ms_stuff( 'remarkable', 3, -1, 'MARK' ) AS "-1"  --reMARKemarkable     
          , sys.ms_stuff( 'remarkable', 3, -2, 'MARK' ) AS "-2"; --reMARKremarkable

Find a String in the List

FIELD function can have unlimited number of arguments. First argument is the string we are searching for.
All other arguments are a list in which we are searching. The result will be the position of the first string in that list.
That position is zero based.  
SELECT FIELD( 'A', 'A', 'B', 'C' ), FIELD( 'B', 'A', 'B', 'C' );
Search is case sensitive. If we can not find the string, the result will be NULL.
SELECT FIELD( 'z', 'A', 'B', 'C' );

Other Functions

With sys.md5 function we will calculate a hash of a string. Hush will have 32 hexadecimal characters.  
SELECT sys.md5('abc');
SPLITPART function will split the string by its delimiter. That will transform the string into list. From that list we will take the third element.
SELECT SPLITPART( 'a1||b2||cd3||ef4', '||', 3);

Blob Functions

We have two functions to return the size of a BLOB in bytes. They are the same function.  
SELECT length(x'0012FF'), octet_length(x'0012FF');

String Similarity

Before calculating similarity between two strings, we should standardize those strings as much as possible. In MonetDB, we can achieve that with QGRAMNORMALIZE function.

QGRAMNORMALIZE function will remove everything accept letters and numbers. All letters will be transformed into uppercase.  
SELECT QGRAMNORMALIZE('Yahoo! Japan');

In order to transform word "kitten" into word "sitting" we need one insert, and two updates.

kitten -> ("k" to "s" update) -> sitten -> ( "e" to "i" update ) -> sittin -> ( add "g" on  end ) -> sittingIf the "price" of each step is 1, the full price
for 3 steps is 3. We can get number
of steps with Levenshtein function.
That is how we measure similarity of words.
Deletes would be also counted as 1.
SELECT LEVENSHTEIN('kitten','sitting');

The DAMERAU-LEVENSHTEIN function is similar to the LEVENSHTEIN function. The difference is that DAMERAU-LEWENSTEIN also takes transpositions into account.
In the DAMERAU-LEVENSHTEIN function, "AC" -> "CA" is only one step. In the LEVENSHTEIN function, we would have to update two letters, so that transformation would be 2 steps.  
It seems that in MonetDB, these two functions are the same, and the behavior is controlled by the number of the arguments.

If we want to achieve the behavior of the DAMERAU-LEVENSHTEIN function, we have to provide 5 arguments. Last three arguments are in order: costs for insert/delete, substitution, transposition.SELECT LEVENSHTEIN('AC','CA', 1, 1, 1 ),             DAMERAULEVENSHTEIN('AC','CA', 1, 1, 1 );  
If we provide only two arguments, then functions will act as a LEVENSHTEIN function.SELECT LEVENSHTEIN('AC','CA'),              DAMERAULEVENSHTEIN('AC','CA');  

SELECT LEVENSHTEIN('AC','CA', 1, 1, 3 );

We will make transposition to cost 3 steps.
But the result is still only 2. That is because we can achieve the same result in three ways.
AC -> (transposition) -> CA                    --cost would be 3 steps
AC -> (remove A) -> C -> (add A) -> CA         --only 2 steps
AC -> (update A>C) -> CC -> (update C>A) -> CA --again 2 steps

We will always get the smallest metric.

There is also a version of the LEVENSHTEIN function with 4 arguments. In this function, default cost for the insert is 1. Third argument is DELETE, forth is REPLACE. Transposition is not possible in this case, so it will not act as a DAMERAULEVENSHTEIN function.

SELECT LEVENSHTEIN('AC','CA', 2, 2 ) AS "22",  --insert1 + delete2 = 3
       LEVENSHTEIN('AC','CA', 3, 2 ) AS "32",  --insert1 + delete3 = 4
       LEVENSHTEIN('AC','CA', 2, 3 ) AS "23",  --insert1 + delete2 = 3
       LEVENSHTEIN('AC','CA', 3, 3 ) AS "33";  --insert1 + delete3 = 4

EDITDISTANCE is the same as DAMERAULEVENSHTEIN.
Everything cost 1 point, but transposition costs 2 points.
SELECT EDITDISTANCE( 'AC', 'CA' );
EDITDISTANCE2 is the same as DAMERAULEVENSHTEIN. Everything cost 1 point.  
SELECT EDITDISTANCE2( 'AC', 'CA' );

Jaro-Winkler function

Jaro-Winkler function will compare two strings. If they are the same, result will be 1, if they are totally different, result will be 0. These functions will give bigger result to strings which have the same prefix. The bigger the prefix, the bigger the boost will be.
SELECT JAROWINKLER('ACA','CAA') AS "ACA-CAA"     
     , JAROWINKLER('ACA','CA') AS "ACA-CA"     
     , JAROWINKLER('ZZ','ZZ') AS "ZZ-ZZ";

Soundex

SOUNDEX function works only for English language. SOUNDEX function will transform a string to number. That number depicts how that word would be pronounced in the English language. If two words have the same pronunciation, then SOUNDEX function will return the same number for those 2 words.  
SELECT SOUNDEX('Rose') AS "Rose", SOUNDEX('Rows') AS "Rows"     
     , SOUNDEX('Which') AS "Which", SOUNDEX('Witch') AS "Witch"     
     , SOUNDEX('Table') AS "Table", SOUNDEX('Fork') AS "Fork";
Soundex result is made of the starting letter of the word, and three numbers. If two words have the same SOUNDEX code, then DIFFERENCE function would return 0, because there would be 0 differences.
SELECT DIFFERENCE('Rose', 'Rows') AS "R",    --R200 vs R200  => 0        
       DIFFERENCE('Which', 'Witch') AS "W",  --W200 vs W320  => 2        
       DIFFERENCE('Table', 'Fork') AS "TF";  --T140 vs F620  => 3  


So, the result of the DIFFERENCE function can be between 0 and 4.

0260 Mathematical Functions in MonetDB

We know that there are 9 number data types ( tinyint, smallint, int, bigint, hugeint, decimal, double, float, real ). If we want to convert some value to these data types, we can use CAST and CONVERT functions.

SELECT CAST(true as smallint) AS "Smallint",
       CONVERT(42, int) AS "Int",
       CONVERT(123.45, real) AS "Real",
       CAST('123.45' as double) AS "Double",
       CAST(23.45 as decimal(5,2)) AS "Decimal"; 
 

Arithmetic Operators

We can always make our calculation by using operators + ,  * /  or modulo %.

SELECT 0 + 1 * 4 - ( 5.0 / 2 ) AS "Calculation",  -- 5.0 / 2 = 2.5 because of decimal
       7 / 2 AS "IntegerDivision",   -- 7 / 2 = 3 because of INT
       8 % 3 AS "Modulo";  

Arithmetic operators have their counterpart in arithmetic functions:

SELECT mod( 8, 3 ) AS "Modulo",
       sql_add( 3, 3 ) AS "Summation",
       sql_div( 8, 2 ) AS "Division",
       sql_mul( 4, 5 ) AS "Multiplication",
       sql_neg( 1 ) AS "Negation",
       sql_sub( 7, 2 ) AS "Subtraction";

Bitwise Operators and Functions

We already know operators AND, NOT, OR and XOR. We used them with TRUE and FALSE values. In MonetDB, instead of TRUE and FALSE we can use 1 and 0.

SELECT TRUE AND FALSE; SELECT 1 AND 0;              -- we'll get the same result

If we have two binary numbers, then we can apply AND, NOT, OR and XOR on each of their zeros and ones. That is how bitwise operators work.

Number110101
Number211100
————————————————————————
Num1 AND Num210100
Num1 OR Num211101
Num1 XOR Num201001
NOT Num101010

Instead of typing binary numbers, we will show them as regular integers. This is how we use bitwise operators and functions:

 Bitwise operatorBitwise functionResult
ANDSELECT 91 & 15;SELECT BIT_AND( 91, 15 );11 -- 01011011 AND 00001111 = 00001011
ORSELECT 32 | 3;SELECT BIT_OR( 32, 3 );35 -- 00100000 OR  00000011 = 00100011 
XORSELECT 17 ^ 5;SELECT BIT_XOR( 17, 5 );20 -- 00010001 XOR 00000101 = 00010100
NOTSELECT ~1;SELECT BIT_NOT( 1 )-2          -- NOT 00000001 = 11111110

Bitwise Shift

Bitwise left shift is when we add a zero on the right side of a binary number. Each time we add another zero, the value of a number is increased double.

0000000 <<   1   =   00000010
SELECT 1 << 1;  -- 2
00000001  <<   2   =   00000100
SELECT 1 << 2;   -- 4
00000001  <<   3   =   00001000
SELECT 1 << 3;    -- 8

It is similar, if we remove one figure from the the right side. Then we have a right shift. Each right shift will reduce number to its half.

00001000  >>  1   =   00000100   SELECT 8 >> 1;  -- 400000100  >>  1   =   00000010   SELECT 4 >> 1;   -- 200000010  >>  1   =   00000001   SELECT 2 >> 1;   -- 1

Beside using operators, we also have functions that can do bitwise shift.

SELECT left_shift(1, 2);SELECT right_shift(8, 1);

Min and Max

To calculate the larger/smaller value of two values, we can use the functions "largest" and "smallest". The functions "skl_max" and "skl_min" are just synonyms of the previous two functions.

SELECT greatest( 4, 5 ) AS "Greatest",
       sql_max( 4, 5 ) AS "Sql_max",
       least( 2, 100 ) AS "Least",
       sql_min( 2, 100 ) AS "Sql_min";

Taking a Root

For calculation root we have two functions. One is for square root, and the other one is for the cube root.

SELECT SQRT( 9 ) AS "Square root",
CBRT( 8 ) AS "Cube root",
POWER( 16.0, 1.0 / 4 ) AS "Fourth root";

If we need to calculate a root of a higher order, then we can use mathematical property . In the last column we calculated the fourth root of a 16 by using this property. For this transformation we had to use POWER function.

Logarithm and Exponential Functions

SELECT POWER( 2, 5 );    --32This function will raise a number to the power of another number.
SELECT exp(3);   -- 20,08553692We can define this function based on POWER function, because exp(3) = POWER(e,3),
where e is Euler number (approximately 2.71).

The opposite function of POWER function is a logarithm. If POWER( b, x ) = Y, then log( b, Y ) = x:

SELECT POWER( 2, 3 ) AS "Power", LOG( 2, 8 ) AS "Logarithm";

If we are working with an Euler number, then we can use LOG function without its first argument (or we can use its alias LN function):

SELECT LOG( 20.08553692 ) AS "Log", LN( 20.08553692 ) AS "Ln" ;

We also can use functions LOG10( x ) or LOG2( x ), which are just shortcuts for LOG10( x ) = LOG( 10, x ) and LOG2( x ) = LOG( 2, x ).

Ceiling and Floor Functions

For each real number N, we can say that it is between two nearest integers. Those two integers we call ceiling and floor. We have SQL functions with such names, they will return us those two nearest integers:

SELECT FLOOR( 7.55 ) AS "Floor", CEILING( 7.55 ) AS "Ceiling";

For CEILING function, we can use its alias CEIL.

Each integer is floor and ceiling for itself.

Rounding and Truncating

We can reduce the number of decimal places in two ways. We can just cut off the excess decimals using the truncate function. The second argument will decide how many decimal places to trim.

SELECT sys.ms_trunc( -5.628, 0 ) AS "No decimals",
          sys.ms_trunc( 5.628, 1 ) AS "One decimal",
       sys.ms_trunc( -5.628, 2 ) AS "Two decimals";

Another approach is to round the number. The second argument will again decide how many decimal places will survive. If the last decimal is 5, we will always round up (in absolute value).

SELECT ROUND( 5.5, 0 ) AS "No decimals",
       ROUND( -5.55, 1 ) AS "One decimal",
       ROUND( 5.555, 2 ) AS "Two decimals";

If the second argument is negative, the result will be zero.SELECT sys.ms_trunc( 9.99, -2 ) AS "Truncate",        ROUND( 9.99, -3 ) AS "ROUND";

There is also a function that does truncation first and then rounding.
SELECT sys.ms_round( 5.546, 2, 1 ),   --5.546 => 5.54 (truncate) => 5.54 (rounded)
       sys.ms_round( 5.546, 2, 0 );   --5.546 => 5.546 (no truncate) => 5.55 (rounded)

Creating Random Numbers

Each time we call RAND() function, we will get random integer between   0   and   2,147,483,647    (   231-1  ).
SELECT RAND() AS "First call", RAND() AS "Second call";

RAND() function can accept seed argument. The seed argument will allow us to get the same random value every time.
SELECT RAND(77) AS "First call", RAND(77) AS "Second call";

Other Mathematical Functions

SELECT ABS( -5 );        --to calculate an absolute value

SELECT SIGN( -5 ), SIGN( 5 );        -- to find out the sign of a number

SELECT sys.alpha(5.0, 1.2); This function is for astronomy.
compute alpha 'expansion' of theta for a given declination (used by SkyServer)

Trigonometric and Hyperbolic Functions

SELECT PI(); will return 3.14.

This is one right triangle. The smallest of its angles is 36.87°.

To transform this angle to radians we can use RADIANS function. For transforming it back we will use DEGREES function.

SELECT RADIANS( 36.87 );   --returns 0.6435, that is 36.87 * ( 3.14 / 180 )

SELECT DEGREES( 0.6435 );  --returns 36.87, that is 0.6435 * ( 180 / 3.14 )
SELECT SIN( 0.6435 );Sine function will return the ratio b/c = 3/5 = 0.6. It accepts an angle value in radians, as do all the functions below.
SELECT COS( 0.6435 );Cosine function will return the ratio a/c = 4/5 = 0.8.
SELECT TAN( 0.6435 );Tangent function will return the ratio b/a = 3/4 = 0.75.
SELECT COT( 0.6435 );Cotangent function will return the ratio a/b = 4/3 = 1.33.

Then we have inverse functions:

SELECT ASIN( 0.6 );         -- 0.6435If we know that b/c = 0.6, then arcsine function will tell us that the angle is 0.6435 radians.
SELECT ACOS( 0.8 );         -- 0.6435If we know that a/c = 0.8, then arccosine function will tell us that the angle is 0.6435 radians.
SELECT ATAN( 0.75 );        -- 0.6435If we know that b/a = 0.75, then arctangent function will tell us that the angle is 0.6435 radians.
SELECT ATAN( 3, 4 );        -- 0.6435The same ATAN function will also work with dividend and divisor ( 3/4 = 0.75 ).

We also have some hyperbolic functions:

SELECT sinh( 1.4 );     --1.9043    This is hyperbolic sine function.
SELECT cosh( 1.4 );     --2.150898    This is hyperbolic cosine function.
SELECT tanh( 1.4 );     --0.88535  This is hyperbolic tangent.

0250 Offset Window Functions in MonetDB

Offset functions are window functions that return a value from a specified row. The row is determined as an offset from the reference point. There are two major reference points. The first is the current row, the second is the frame border. Based on this, we distinguish between "row offset functions" and "frame offset functions".

Row Offset Functions

With row offset functions, we move from the current row several steps up or down. Our movement is dependent only on the current row, so we do not use frames with row offset functions. Frames are not applicable in this case.

LAG function will move us up. LEAD function will move us down.

We will create one sample table so that we can observe how these functions work.

CREATE TABLE offsetTable( "Month" INT, Value INT );  

INSERT INTO offsetTable ( "Month", Value ) VALUES  ( 1, 1 ), ( 2, 2 ) , ( 3, null ), ( 5, 5 ), ( 6, 6 ), ( 7, 7 );

LAG function will allow us to get some previous value and to place it in the current row. That would allow us to compare some old and some new value, so we can analyze the change in time.
SELECT "Month", Value, LAG( Value, 2 ) OVER ( ORDER BY "Month" )
FROM offsetTable ORDER BY "Month";

We can see three drawbacks in the image above:

  • Month April doesn't have a value. LAG function doesn't notice that, and will happily move two rows above, but not two months before. For the fifth month we are getting the value from the February which is three months before, not two. That is the consequence of the missing month April. We should avoid using LAG function when we are missing some time points.
  • For months January and February, we have nulls in the new column . Previous values don't exist so we have to be satisfied with nulls.
  • In the third row, the Value column has NULL. Sometimes we want to go over those nulls and not count them as steps we take. There is a way to achieve this in SQL standard, but that part of SQL standard is not implemented in MonetDB, so null values will always be counted as one step. Subclause IGNORE NULLS is not supported.  
--not supported
SELECT "Month", Value, LAG( Value, 2) IGNORE NULLS
OVER (ORDER BY "Month" )
FROM offsetTable ORDER BY "Month";

LAG can work with partitions.

We should always order rows inside partition with ORDER BY. Although LAG function can work without ORDER BY, the results will be unpredictable and meaningless.

LAG Function Arguments

If the second argument is missing, then the default value is 1. In the image, we can see that the rows are shifted by one row.

SELECT "Month", Value, LAG(Value) OVER (ORDER BY "Month") FROM offsetTable ORDER BY "Month";

Only if we provide second argument, we can also provide the third argument. The third argument is the default value. Default value will be used for those rows where previous values don't exist.
SELECT "Month", Value, LAG(Value,2,77) OVER (ORDER BY "Month") FROM offsetTable ORDER BY "Month";

Default value has to be of the proper data type that match other values in the column.

Third argument is optional. If we skip it, the default value would be null.

SELECT "Month", Value, LAG(Value,2) OVER (ORDER BY "Month") FROM offsetTable ORDER BY "Month";

LEAD Function

LEAD function is the same as LAG function, just the opposite. Notice that we can achieve the same result with LEAD function, as with LAG function, if we change ORDER BY from ascending order to descending order.

SELECT "Month", Value, LAG(Value,1) OVER (ORDER BY "Month" ASC)
FROM offsetTable ORDER BY "Month";
SELECT "Month", Value, LEAD(Value,1) OVER (ORDER BY "Month" DESC)
FROM offsetTable ORDER BY "Month";
                  <=== totally the same ===>

This is similar example. Both statements use ASC, but the result is the same. This is because the offset argument can be negative. This is another way we can make the LAG and LEAD functions do the same thing.

SELECT "Month", Value, LAG(Value,2) OVER (ORDER BY "Month" ASC)
FROM offsetTable ORDER BY "Month";
SELECT "Month", Value, LEAD(Value,-2) OVER (ORDER BY "Month" ASC)
FROM offsetTable ORDER BY "Month";
             <=== totally the same ===>

Frame Offset Functions

For frame offset functions everything is relative to frame borders. We have three functions of this type.
FIRST_VALUE will return the first value in the frame.
NTH_VALUE will return the NTH value in the frame, from the start of a frame.
LAST_VALUE will return the last value in the frame.

For frame offset functions, we always need to have frame. If 6 is the month in the current row, then the frame is between 4th and 8th month. Now that we know the frame size and position, it is easy to see that the first value in this frame is 5.

SELECT "Month", Value, FIRST_VALUE(Value) OVER
(ORDER BY "Month" RANGE BETWEEN 2 PRECEDING AND 2 FOLLOWING)
FROM offsetTable ORDER BY "Month";

If we omit frame definition, the default frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW will be used. Be aware of this and always define frame.  

SELECT "Month", Value, LAST_VALUE(Value) OVER (ORDER BY "Month")
FROM offsetTable ORDER BY "Month";

Frame offset functions can work with partitions.

We should always order rows inside frame with ORDER BY. Frame offset functions can work without ORDER BY, but the results will be unpredictable and meaningless.

NTH_VALUE Function

NTH_VALUE function has another argument which tells how many steps to move from the start of a frame. In the example bellow, our frame is the whole partition. Forth number in the "value" column is number 5. That number will be the result of this window function.
SELECT "Month", Value, NTH_VALUE(Value,4)
OVER (ORDER BY "Month" ASC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM offsetTable ORDER BY "Month";

If the second argument is omitted, default value will not be one, but the error will be raised.

The value of that second argument can not be negative.

We can use partitions with NTH_VALUE.
You should always provide frame definition, and you should have ORDER BY for your frame.

Difference From SQL Standard

IGNORE NULLS is not supported in MonetDB for the functions FIRST_VALUE, LAST_VALUE and NTH_VALUE.

With IGNORE NULLS we would be able to not count nulls, but this is not supported in MonetDB.
SELECT "Month", Value, NTH_VALUE(Value,3) IGNORE NULLS   --not supported
OVER (ORDER BY "Month" ASC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

FROM offsetTable ORDER BY "Month";

NTH_VALUE can not use subclause FROM LAST. This would allow to count steps from the end of the frame, and not from the start. But this is not supported in MonetDB.

SELECT "Month", Value, NTH_VALUE(Value,2) FROM LAST   --not supported
OVER (ORDER BY "Month" ASC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM offsetTable ORDER BY "Month";

We actually don't need "FROM LAST" subclause. It is enough to change ORDER BY to descending, and we would get the same result as with "FROM LAST".
SELECT "Month", Value, NTH_VALUE(Value,2)
OVER (ORDER BY "Month" DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM offsetTable ORDER BY "Month";