0270 String Functions in MonetDB

By placing letter e before string, we can write control characters ( Tab, New line, Carriage return ) with their specifiers. This will work even without e.

select e'A\nB' AS NewLine
     , e'A\u2764' AS Unicode
     , e'A\tB'AS TAB
     , 'A\tB'AS TAB2;  
Control CharacterNameUsage
\nNewlineMoves the cursor to the next line.
\rCarriage returnMoves the cursor to the beginning of the current line.
\tTab spaceInserts a tab space.
\bBackspaceDeletes the character just before the control character.
\fForm feedMoves the cursor to the next page.
\uUnicode characterRepresents a Unicode character using a code number.

Without e, the result is the same ( e'A\tB' = 'A\tB' ). If we want to write literal specifiers, then we have to use prefix R.

select r'A\nB' AS NewLine, r'A\u2764' AS Unicode, R'A\tB'AS TAB;

We can write blob with pairs of hexadecimal characters. Before such blob we should type X or blob.SELECT x'12AB', blob '12AB';

Concatenation

Concatenation can be done with operator || or with concat function. Concat function accepts only 2 arguments.

SELECT 'p' || 'Q' || 'R', concat( 'p', 'Q' );

Number of Characters

Length function returns number of characters. Synonims for length function are char_length and character_length.   MonetDB encode all of the strings with UTF-8. Some of UTF-8 characters use several bytes. If we want to know length of a string in bytes then we should use octet_length function.

SELECT length(R'2€') AS CharNo, octet_length(R'2€') AS ByteNo;

Searching for One String Within Another

In our example, LOCATE looks for the substring 'es' within the second argument. If it can find it, LOCATE will return in what position.
SELECT LOCATE( 'es', 'successlessness' ), LOCATE( 'es', 'successlessness', 8 );
We can give a third argument. In our example, we will not search for 'es' in the first 8 characters.
Synonym for this function is CHARINDEX.

POSITION function will also return position of the substring 'es'.
SELECT POSITION( 'es' IN 'successlessness' );

PATINDEX function is similar. Difference is that this function can use wildcards. It can use '%' for zero or more characters or '_' for one character. It will return position, not of the first letter, but of the last letter.
SELECT PATINDEX( '%eS', 'succeSslessness' ),PATINDEX( '____eS', 'succeSslessness' );
All of these functions (LOCATE, CHARINDEX, POSITION, PATINDEX) are case sensitive.
SELECT CHARINDEX( 'ES', 'successlessness' );
If they can not find substring, all of them will return zero.

Get the Substring by its Position

SUBSTRING function will return all the letters of the string after some position.
SELECT SUBSTRING( '123456', 3 ), SUBSTRING( '123456', 3, 2 ) ;
With the third argument, we can limit number of characters returned.
If the second argument is too large, we will get an empty string.
SELECT SUBSTRING( '123', 5 ), SUBSTRING( '123', 2, 7 ) ;
If the third argument is too large, it will have no effect.

Instead of SUBSTRING, we can use SUBSTR.

Convert Case

LCASE will transform the string to lower case. UCASE to upper case. Synonyms are LOWER and UPPER.
SELECT LCASE( 'NEW' ), UCASE( 'new' ), QGRAMNORMALIZE ('Prize 50!');
Special function QGRAMNORMALIZE will transform the string to upper case while removing all the
strange characters, only numbers and letters will remain.

Max and Min

We can order words alphabetically ( Apple, Zebra ). Minimum function will return the word closer to "A",
and maximum function will return the word closer to "Z".
SELECT sql_min( 'Apple', 'Zebra' ) AS "MIN", sql_max( 'Apple', 'Zebra' ) AS "MAX";
Synonyms for these functions are LEAST and GREATEST.

Unicode and ASCII

ASCII function will return Unicode number of the first character of the string (ß=>223).
CODE function will return the opposite, for the Unicode number, it will return the character (223=>ß).
ASCIIFY function will transform all of the non-ascii characters to ascii equivalents (=>EUR).
SELECT ASCII( r'ßzebra' ) AS FirstLetterUnicode     
         , CODE( 223 ) AS UnicodeCharacter     
         , ASCIIFY( R'a sign' ) AS AsciiOnly;

Padding of The String

We can pad some string with spaces from the left, or from the right side. Second argument
decides what should be the length of the whole string.
SELECT '=>' || LPAD( 'zz', 5 ) AS LeftPadding             
             , RPAD( 'zz', 5 ) || '<=' AS RightPadding;
If used, third argument will be used instead of the NULL.
SELECT LPAD( 'zz', 7, 'OE' ) AS LeftPadding, RPAD( 'zz', 7, 'OE' ) AS RightPadding;  
If the string is already too long, then it will be truncated from the right side.
SELECT LPAD( '12345', 3 ) AS LeftPadding     
       , RPAD( '12345', 3 ) AS RightPadding;

Trimming The String

We can trim NULLs from the start, from the end, or from the both ends of a string.
SELECT 'leftTrim' || LTRIM( '   <=' )
                   , RTRIM( '=>   ' ) || 'righTrim'
                   ,  TRIM( '   A   ' ) AS BothSides;
The third argument can be used to specify the string to trim.
SELECT 'leftTrim' || LTRIM( 'UUU<=', 'UU' )
                         , RTRIM( '=>UUU', 'UU' ) || 'righTrim'
                         ,  TRIM( 'zzAzz', 'z' ) AS BothSides;

String Repetition

With SPACE function we can repeat space character many times. With REPEAT function we can repeat any string many times.  
SELECT '|' || SPACE(5) || '|', REPEAT( 'OE', 5);

Extracting the Start or End of the String

This is how we can get the first or last two characters.  
SELECT LEFT( '1234', 2 ), RIGHT( '1234', 2 );

Guess the Start or the End of the String

This is how we can check if the string starts or ends with some other string.
SELECT startswith( 'ABC', 'AB' ), endswith( 'ABC', 'bC' );
If we want to make comparison case insensitive, then we have to provide TRUE for the third argument.
SELECT startswith( 'ABC', 'ab', true ), endswith( 'ABC', 'bC', true );

Does One String Contains Another

CONTAINS function will check whether second string is contained inside of the first string. The function is case sensitive. We can make it insensitive by adding third argument TRUE.

SELECT CONTAINS( 'remarkable', 'MARK' ), CONTAINS( 'remarkable', 'MARK', true );

Insert and Replace

REPLACE function will search for the second string inside of the first string. It will replace it with the third string.
This function is case sensitive.
SELECT REPLACE( 'repeat', 'peat', 'try' ), REPLACE( 'repeat', 'PEAT', 'try' );

INSERT function will remove characters between positions 3 (2 + 1) and 6 (2 + 4). It will replace them with the last argument. In the second column, it will search between positions 3 (10 – 8 + 1) and 6 again, but this time it is counting from the end.
SELECT INSERT( 'remarkable', 2, 4, 'MARK' ), INSERT( 'remarkable', -8, 4, 'MARK' );
If the second argument, in INSERT function, is too big, then the string will get a suffix.  
SELECT INSERT( 'remarkable', 50, 4, 'MARK' );

SYS.MS_STUFF function is similar to INSERT function. Main difference is that for the second argument we have to provide the exact position. Instead of 2 we have to provide exact 3.  
SELECT sys.ms_stuff( 'remarkable', 3, 4, 'MARK' );
If we make a mistake with the second argument, while using SYS.MS_STUFF function, we will get an empty string as a result.
SELECT sys.ms_stuff( 'remarkable', -3, 1, 'MARK' ), sys.ms_stuff( 'remarkable', 50, 1, 'MARK' );
Interesting thing with the SYS.MS_STUFF function is that the third argument can be negative.
This argument will then duplicate several last characters from the start of the string.
SELECT sys.ms_stuff( 'remarkable', 3, 0, 'MARK' ) AS "0"    --reMARKmarkable     
          , sys.ms_stuff( 'remarkable', 3, -1, 'MARK' ) AS "-1"  --reMARKemarkable     
          , sys.ms_stuff( 'remarkable', 3, -2, 'MARK' ) AS "-2"; --reMARKremarkable

Find a String in the List

Field function can have unlimited number of arguments. First argument is the string we are searching for.
All other arguments are a list in which we are searching. The result will be the position of the first string in that list.
That position is zero based.  
SELECT FIELD( 'A', 'A', 'B', 'C' ), FIELD( 'B', 'A', 'B', 'C' );
Search is case sensitive. If we can not find the string, the result will be NULL.
SELECT FIELD( 'z', 'A', 'B', 'C' );

Other Functions

With sys.md5 function we will calculate a hash of a string. Hush will be 32 hexadecimal characters.  
SELECT sys.md5('abc');
SPLITPART function will split the string by its delimiters. That will transform the string into list. From that list we will take the third element.
SELECT SPLITPART( 'a1||b2||cd3||ef4', '||', 3);

Blob Functions

We have two functions to return the size of a BLOB in bytes. They are the same function.  
SELECT length(x'0012FF'), octet_length(x'0012FF');

String Similarity

Before calculating similarity between two strings, we should standardize those strings as much as possible. In MonetDB, we can achieve that with QGRAMNORMALIZE function.

QGRAMNORMALIZE function will remove everything accept letters and numbers. All letters will be transformed into uppercase.  
SELECT QGRAMNORMALIZE('Yahoo! Japan');

In order to transform word "kitten" into word "sitting" we need one insert, and two updates.

kitten -> ("k" to "s" update) -> sitten -> ( "e" to "i" update ) -> sittin -> ( add "g" on  end ) -> sittingIf the "price" of each step is 1, the full price
for 3 steps is 3. We can get number
of steps with Levenshtein function.
That is how we measure similarity of words.
Deletes would be also counted as 1.
SELECT LEVENSHTEIN('kitten','sitting');

The DAMERAU-LEVENSHTEIN function is similar to the LEVENSHTEIN function. The difference is that DAMERAU-LEWENSTEIN also takes transpositions into account.
In the DAMERAU-LEVENSHTEIN function, "AC" -> "CA" is only one step. In the LEVENSHTEIN function, we would have to update two letters, so that transformation would be 2 steps.  
It seems that in MonetDB, these two functions are the same, and the behavior is controlled by the number of the arguments.

If we want to achieve the behavior of the DAMERAU-LEVENSHTEIN function, we have to provide 5 arguments. Last three arguments are in order: costs for insert/delete, substitution, transposition.SELECT LEVENSHTEIN('AC','CA', 1, 1, 1 ),             DAMERAULEVENSHTEIN('AC','CA', 1, 1, 1 );  
If we provide only two arguments, then functions will act as a LEVENSHTEIN function.SELECT LEVENSHTEIN('AC','CA'),              DAMERAULEVENSHTEIN('AC','CA');  

SELECT LEVENSHTEIN('AC','CA', 1, 1, 3 );

We will make transposition to cost 3 steps.
But the result is still only 2. That is because we can achieve the same result in three ways.
AC -> (transposition) -> CA                    --cost would be 3 steps
AC -> (remove A) -> C -> (add A) -> CA         --only 2 steps
AC -> (update A>C) -> CC -> (update C>A) -> CA --again 2 steps

We will always get the smallest metric.

There is also a version of the LEVENSHTEIN function with 4 arguments. In this function, default cost for the insert is 1. Third argument is DELETE, forth is REPLACE. Transposition is not possible in this case, so it will not act as a DAMERAULEVENSHTEIN function.

SELECT LEVENSHTEIN('AC','CA', 2, 2 ) AS "22",  --insert1 + delete2 = 3
       LEVENSHTEIN('AC','CA', 3, 2 ) AS "32",  --insert1 + delete3 = 4
       LEVENSHTEIN('AC','CA', 2, 3 ) AS "23",  --insert1 + delete2 = 3
       LEVENSHTEIN('AC','CA', 3, 3 ) AS "33";  --insert1 + delete3 = 4

EDITDISTANCE is the same as DAMERAULEVENSHTEIN.
Everything cost 1 point, but transposition costs 2 points.
SELECT EDITDISTANCE( 'AC', 'CA' );
EDITDISTANCE2 is the same as DAMERAULEVENSHTEIN. Everything cost 1 point.  
SELECT EDITDISTANCE2( 'AC', 'CA' );

Jaro-Winkler function

Jaro-Winkler function will compare two strings. If they are the same, result will be 1, if they are totally different, result will be 0. These functions will give bigger result to strings which have the same prefix. The bigger the prefix, the bigger the boost will be.
SELECT JAROWINKLER('ACA','CAA') AS "ACA-CAA"     
     , JAROWINKLER('ACA','CA') AS "ACA-CA"     
     , JAROWINKLER('ZZ','ZZ') AS "ZZ-ZZ";

Soundex

SOUNDEX function works only for English language. SOUNDEX function will transform a string to number. That number depicts how that word would be pronounced in the English language. If two words have the same pronunciation, then SOUNDEX function will return the same number for those 2 words.  
SELECT SOUNDEX('Rose') AS "Rose", SOUNDEX('Rows') AS "Rows"     
     , SOUNDEX('Which') AS "Which", SOUNDEX('Witch') AS "Witch"     
     , SOUNDEX('Table') AS "Table", SOUNDEX('Fork') AS "Fork";
Soundex result is made of the starting letter of the word, and three numbers. If two words have the same SOUNDEX code, then DIFFERENCE function would return 0, because there would be 0 differences.
SELECT DIFFERENCE('Rose', 'Rows') AS "R",    --R200 vs R200  => 0        
       DIFFERENCE('Which', 'Witch') AS "W",  --W200 vs W320  => 2        
       DIFFERENCE('Table', 'Fork') AS "TF";  --T140 vs F620  => 3  


So, the result of the DIFFERENCE function can be between 0 and 4.

0260 Mathematical Functions in MonetDB

We know that there are 9 number data types ( tinyint, smallint, int, bigint, hugeint, decimal, double, float, real ). If we want to convert some value to these data types, we can use CAST and CONVERT functions.

SELECT CAST(true as smallint) AS "Smallint",
       CONVERT(42, int) AS "Int",
       CONVERT(123.45, real) AS "Real",
       CAST('123.45' as double) AS "Double",
       CAST(23.45 as decimal(5,2)) AS "Decimal"; 
 

Arithmetic Operators

We can always make our calculation by using operators + ,  * /  or modulo %.

SELECT 0 + 1 * 4 - ( 5.0 / 2 ) AS "Calculation",  -- 5.0 / 2 = 2.5 because of decimal
       7 / 2 AS "IntegerDivision",   -- 7 / 2 = 3 because of INT
       8 % 3 AS "Modulo";  

Arithmetic operators have their counterpart in arithmetic functions:

SELECT mod( 8, 3 ) AS "Modulo",
       sql_add( 3, 3 ) AS "Summation",
       sql_div( 8, 2 ) AS "Division",
       sql_mul( 4, 5 ) AS "Multiplication",
       sql_neg( 1 ) AS "Negation",
       sql_sub( 7, 2 ) AS "Subtraction";

Bitwise Operators and Functions

We already know operators AND, NOT, OR and XOR. We used them with TRUE and FALSE values. In MonetDB, instead of TRUE and FALSE we can use 1 and 0.

SELECT TRUE AND FALSE; SELECT 1 AND 0;              -- we'll get the same result

If we have two binary numbers, then we can apply AND, NOT, OR and XOR on each of their zeros and ones. That is how bitwise operators work.

Number110101
Number211100
————————————————————————
Num1 AND Num210100
Num1 OR Num211101
Num1 XOR Num201001
NOT Num101010

Instead of typing binary numbers, we will show them as regular integers. This is how we use bitwise operators and functions:

 Bitwise operatorBitwise functionResult
ANDSELECT 91 & 15;SELECT BIT_AND( 91, 15 );11 -- 01011011 AND 00001111 = 00001011
ORSELECT 32 | 3;SELECT BIT_OR( 32, 3 );35 -- 00100000 OR  00000011 = 00100011 
XORSELECT 17 ^ 5;SELECT BIT_XOR( 17, 5 );20 -- 00010001 XOR 00000101 = 00010100
NOTSELECT ~1;SELECT BIT_NOT( 1 )-2          -- NOT 00000001 = 11111110

Bitwise Shift

Bitwise left shift is when we add a zero on the right side of a binary number. Each time we add another zero, the value of a number is increased double.

0000000 <<   1   =   00000010
SELECT 1 << 1;  -- 2
00000001  <<   2   =   00000100
SELECT 1 << 2;   -- 4
00000001  <<   3   =   00001000
SELECT 1 << 3;    -- 8

It is similar, if we remove one figure from the the right side. Then we have a right shift. Each right shift will reduce number to its half.

00001000  >>  1   =   00000100   SELECT 8 >> 1;  -- 400000100  >>  1   =   00000010   SELECT 4 >> 1;   -- 200000010  >>  1   =   00000001   SELECT 2 >> 1;   -- 1

Beside using operators, we also have functions that can do bitwise shift.

SELECT left_shift(1, 2);SELECT right_shift(8, 1);

Min and Max

To calculate the larger/smaller value of two values, we can use the functions "largest" and "smallest". The functions "skl_max" and "skl_min" are just synonyms of the previous two functions.

SELECT greatest( 4, 5 ) AS "Greatest",
       sql_max( 4, 5 ) AS "Sql_max",
       least( 2, 100 ) AS "Least",
       sql_min( 2, 100 ) AS "Sql_min";

Taking a Root

For calculation root we have two functions. One is for square root, and the other one is for the cube root.

SELECT SQRT( 9 ) AS "Square root",
CBRT( 8 ) AS "Cube root",
POWER( 16.0, 1.0 / 4 ) AS "Fourth root";

If we need to calculate a root of a higher order, then we can use mathematical property . In the last column we calculated the fourth root of a 16 by using this property. For this transformation we had to use POWER function.

Logarithm and Exponential Functions

SELECT POWER( 2, 5 );    --32This function will raise a number to the power of another number.
SELECT exp(3);   -- 20,08553692We can define this function based on POWER function, because exp(3) = POWER(e,3),
where e is Euler number (approximately 2.71).

The opposite function of POWER function is a logarithm. If POWER( b, x ) = Y, then log( b, Y ) = x:

SELECT POWER( 2, 3 ) AS "Power", LOG( 2, 8 ) AS "Logarithm";

If we are working with an Euler number, then we can use LOG function without its first argument (or we can use its alias LN function):

SELECT LOG( 20.08553692 ) AS "Log", LN( 20.08553692 ) AS "Ln" ;

We also can use functions LOG10( x ) or LOG2( x ), which are just shortcuts for LOG10( x ) = LOG( 10, x ) and LOG2( x ) = LOG( 2, x ).

Ceiling and Floor Functions

For each real number N, we can say that it is between two nearest integers. Those two integers we call ceiling and floor. We have SQL functions with such names, they will return us those two nearest integers:

SELECT FLOOR( 7.55 ) AS "Floor", CEILING( 7.55 ) AS "Ceiling";

For CEILING function, we can use its alias CEIL.

Each integer is floor and ceiling for itself.

Rounding and Truncating

We can reduce the number of decimal places in two ways. We can just cut off the excess decimals using the truncate function. The second argument will decide how many decimal places to trim.

SELECT sys.ms_trunc( -5.628, 0 ) AS "No decimals",
          sys.ms_trunc( 5.628, 1 ) AS "One decimal",
       sys.ms_trunc( -5.628, 2 ) AS "Two decimals";

Another approach is to round the number. The second argument will again decide how many decimal places will survive. If the last decimal is 5, we will always round up (in absolute value).

SELECT ROUND( 5.5, 0 ) AS "No decimals",
       ROUND( -5.55, 1 ) AS "One decimal",
       ROUND( 5.555, 2 ) AS "Two decimals";

If the second argument is negative, the result will be zero.SELECT sys.ms_trunc( 9.99, -2 ) AS "Truncate",        ROUND( 9.99, -3 ) AS "ROUND";

There is also a function that does truncation first and then rounding.
SELECT sys.ms_round( 5.546, 2, 1 ),   --5.546 => 5.54 (truncate) => 5.54 (rounded)
       sys.ms_round( 5.546, 2, 0 );   --5.546 => 5.546 (no truncate) => 5.55 (rounded)

Creating Random Numbers

Each time we call RAND() function, we will get random integer between   0   and   2,147,483,647    (   231-1  ).
SELECT RAND() AS "First call", RAND() AS "Second call";

RAND() function can accept seed argument. The seed argument will allow us to get the same random value every time.
SELECT RAND(77) AS "First call", RAND(77) AS "Second call";

Other Mathematical Functions

SELECT ABS( -5 );        --to calculate an absolute value

SELECT SIGN( -5 ), SIGN( 5 );        -- to find out the sign of a number

SELECT sys.alpha(5.0, 1.2); This function is for astronomy.
compute alpha 'expansion' of theta for a given declination (used by SkyServer)

Trigonometric and Hyperbolic Functions

SELECT PI(); will return 3.14.

This is one right triangle. The smallest of its angles is 36.87°.

To transform this angle to radians we can use RADIANS function. For transforming it back we will use DEGREES function.

SELECT RADIANS( 36.87 );   --returns 0.6435, that is 36.87 * ( 3.14 / 180 )

SELECT DEGREES( 0.6435 );  --returns 36.87, that is 0.6435 * ( 180 / 3.14 )
SELECT SIN( 0.6435 );Sine function will return the ratio b/c = 3/5 = 0.6. It accepts an angle value in radians, as do all the functions below.
SELECT COS( 0.6435 );Cosine function will return the ratio a/c = 4/5 = 0.8.
SELECT TAN( 0.6435 );Tangent function will return the ratio b/a = 3/4 = 0.75.
SELECT COT( 0.6435 );Cotangent function will return the ratio a/b = 4/3 = 1.33.

Then we have inverse functions:

SELECT ASIN( 0.6 );         -- 0.6435If we know that b/c = 0.6, then arcsine function will tell us that the angle is 0.6435 radians.
SELECT ACOS( 0.8 );         -- 0.6435If we know that a/c = 0.8, then arccosine function will tell us that the angle is 0.6435 radians.
SELECT ATAN( 0.75 );        -- 0.6435If we know that b/a = 0.75, then arctangent function will tell us that the angle is 0.6435 radians.
SELECT ATAN( 3, 4 );        -- 0.6435The same ATAN function will also work with dividend and divisor ( 3/4 = 0.75 ).

We also have some hyperbolic functions:

SELECT sinh( 1.4 );     --1.9043    This is hyperbolic sine function.
SELECT cosh( 1.4 );     --2.150898    This is hyperbolic cosine function.
SELECT tanh( 1.4 );     --0.88535  This is hyperbolic tangent.

0250 Offset Window Functions in MonetDB

Offset functions are window functions that return a value from a specified row. The row is determined as an offset from the reference point. There are two major reference points. The first is the current row, the second is the frame border. Based on this, we distinguish between "row offset functions" and "frame offset functions".

Row Offset Functions

With row offset functions, we move from the current row several steps up or down. Our movement is dependent only on the current row, so we do not use frames with row offset functions. Frames are not applicable in this case.

LAG function will move us up. LEAD function will move us down.

We will create one sample table so that we can observe how these functions work.

CREATE TABLE offsetTable( "Month" INT, Value INT );  

INSERT INTO offsetTable ( "Month", Value ) VALUES  ( 1, 1 ), ( 2, 2 ) , ( 3, null ), ( 5, 5 ), ( 6, 6 ), ( 7, 7 );

LAG function will allow us to get some previous value and to place it in the current row. That would allow us to compare some old and some new value, so we can analyze the change in time.
SELECT "Month", Value, LAG( Value, 2 ) OVER ( ORDER BY "Month" )
FROM offsetTable ORDER BY "Month";

We can see three drawbacks in the image above:

  • Month April doesn't have a value. LAG function doesn't notice that, and will happily move two rows above, but not two months before. For the fifth month we are getting the value from the February which is three months before, not two. That is the consequence of the missing month April. We should avoid using LAG function when we are missing some time points.
  • For months January and February, we have nulls in the new column . Previous values don't exist so we have to be satisfied with nulls.
  • In the third row, the Value column has NULL. Sometimes we want to go over those nulls and not count them as steps we take. There is a way to achieve this in SQL standard, but that part of SQL standard is not implemented in MonetDB, so null values will always be counted as one step. Subclause IGNORE NULLS is not supported.  
--not supported
SELECT "Month", Value, LAG( Value, 2) IGNORE NULLS
OVER (ORDER BY "Month" )
FROM offsetTable ORDER BY "Month";

LAG can work with partitions.

We should always order rows inside partition with ORDER BY. Although LAG function can work without ORDER BY, the results will be unpredictable and meaningless.

LAG Function Arguments

If the second argument is missing, then the default value is 1. In the image, we can see that the rows are shifted by one row.

SELECT "Month", Value, LAG(Value) OVER (ORDER BY "Month") FROM offsetTable ORDER BY "Month";

Only if we provide second argument, we can also provide the third argument. The third argument is the default value. Default value will be used for those rows where previous values don't exist.
SELECT "Month", Value, LAG(Value,2,77) OVER (ORDER BY "Month") FROM offsetTable ORDER BY "Month";

Default value has to be of the proper data type that match other values in the column.

Third argument is optional. If we skip it, the default value would be null.

SELECT "Month", Value, LAG(Value,2) OVER (ORDER BY "Month") FROM offsetTable ORDER BY "Month";

LEAD Function

LEAD function is the same as LAG function, just the opposite. Notice that we can achieve the same result with LEAD function, as with LAG function, if we change ORDER BY from ascending order to descending order.

SELECT "Month", Value, LAG(Value,1) OVER (ORDER BY "Month" ASC)
FROM offsetTable ORDER BY "Month";
SELECT "Month", Value, LEAD(Value,1) OVER (ORDER BY "Month" DESC)
FROM offsetTable ORDER BY "Month";
                  <=== totally the same ===>

This is similar example. Both statements use ASC, but the result is the same. This is because the offset argument can be negative. This is another way we can make the LAG and LEAD functions do the same thing.

SELECT "Month", Value, LAG(Value,2) OVER (ORDER BY "Month" ASC)
FROM offsetTable ORDER BY "Month";
SELECT "Month", Value, LEAD(Value,-2) OVER (ORDER BY "Month" ASC)
FROM offsetTable ORDER BY "Month";
             <=== totally the same ===>

Frame Offset Functions

For frame offset functions everything is relative to frame borders. We have three functions of this type.
FIRST_VALUE will return the first value in the frame.
NTH_VALUE will return the NTH value in the frame, from the start of a frame.
LAST_VALUE will return the last value in the frame.

For frame offset functions, we always need to have frame. If 6 is the month in the current row, then the frame is between 4th and 8th month. Now that we know the frame size and position, it is easy to see that the first value in this frame is 5.

SELECT "Month", Value, FIRST_VALUE(Value) OVER
(ORDER BY "Month" RANGE BETWEEN 2 PRECEDING AND 2 FOLLOWING)
FROM offsetTable ORDER BY "Month";

If we omit frame definition, the default frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW will be used. Be aware of this and always define frame.  

SELECT "Month", Value, LAST_VALUE(Value) OVER (ORDER BY "Month")
FROM offsetTable ORDER BY "Month";

Frame offset functions can work with partitions.

We should always order rows inside frame with ORDER BY. Frame offset functions can work without ORDER BY, but the results will be unpredictable and meaningless.

NTH_VALUE Function

NTH_VALUE function has another argument which tells how many steps to move from the start of a frame. In the example bellow, our frame is the whole partition. Forth number in the "value" column is number 5. That number will be the result of this window function.
SELECT "Month", Value, NTH_VALUE(Value,4)
OVER (ORDER BY "Month" ASC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM offsetTable ORDER BY "Month";

If the second argument is omitted, default value will not be one, but the error will be raised.

The value of that second argument can not be negative.

We can use partitions with NTH_VALUE.
You should always provide frame definition, and you should have ORDER BY for your frame.

Difference From SQL Standard

IGNORE NULLS is not supported in MonetDB for the functions FIRST_VALUE, LAST_VALUE and NTH_VALUE.

With IGNORE NULLS we would be able to not count nulls, but this is not supported in MonetDB.
SELECT "Month", Value, NTH_VALUE(Value,3) IGNORE NULLS   --not supported
OVER (ORDER BY "Month" ASC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

FROM offsetTable ORDER BY "Month";

NTH_VALUE can not use subclause FROM LAST. This would allow to count steps from the end of the frame, and not from the start. But this is not supported in MonetDB.

SELECT "Month", Value, NTH_VALUE(Value,2) FROM LAST   --not supported
OVER (ORDER BY "Month" ASC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM offsetTable ORDER BY "Month";

We actually don't need "FROM LAST" subclause. It is enough to change ORDER BY to descending, and we would get the same result as with "FROM LAST".
SELECT "Month", Value, NTH_VALUE(Value,2)
OVER (ORDER BY "Month" DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM offsetTable ORDER BY "Month";

0240 Ranking Window Functions in MonetDB

Sample Table

This will be our sample table:

CREATE TABLE rankFunctions ( Part CHAR(5), Number Integer );  

INSERT INTO rankFunctions ( Part, Number )     
    VALUES ( 'Part1', 9 ), ( 'Part1', 10 ), ( 'Part1', null ), ( 'Part1', 10 ),
                  ( 'Part1', null ), ( 'Part1', 8 ),( 'Part1', 5 ), ( 'Part2', 7 );

Ranking Functions

For each row in Excel we know its row number. On the image we can see that in Excel, in front of column "A", we have a "column" with row numbers.

We can create the same column, with row numbers, in MonetDB, by using ROW_NUMBER() window function. There are two more similar functions in MonetDB, and those are RANK() and DENSE_RANK().

Sometimes, in Olympic games, two sport players can share gold medal. In that case, there is no silver medal. Next best player will get bronze medal. This is called "olympic ranking". In MonetDB we can create such ranking with RANK() function.

For apples we wouldn't apply Olympic ranking. Instead of that we would place apples of the similar quality in consecutive classes, so we would have apples of first, second and third class. This can be achieved with DENSE_RANK() function. As it's name implies, DENSE_RANK() doesn't have gaps in ranking, so there will be always someone who will get silver medal.

With ROW_NUMBER(), RANK() and DENSE_RANK(), we can not use frames. These functions are not returning one value for each row, and because of that frames are not applicable. These functions always work with the whole partitions.



This is an example for all of the three functions:

SELECT part, number,  
   ROW_NUMBER() OVER(ORDER BY Number) AS rowNum,  
   RANK() OVER (ORDER BY Number) AS Rank,  
   DENSE_RANK() OVER(ORDER BY Number) AS dense_Rank
FROM rankFunctions

ORDER BY Number;
From this example we can conlude:
1) All null values are treated as the
same value.
2) It is important to use "ORDER BY"
inside of WINDOW function, without
it result would be meaningless.
3) These three functions are not
taking any arguments.

This is what would be the result without ORDER BY.
RANK and DENSE_RANK functions would return one's,
so they are not really working without ORDER BY.

ROW_NUMBER() function can be still be useful
without ORDER BY. It can be used as a column
that would deduplicate rows that are otherwise indistinguishable.
SELECT part, number,  
   ROW_NUMBER() OVER() AS rowNum,  
   RANK() OVER () AS Rank,      

      DENSE_RANK() OVER() AS dense_Rank
FROM rankFunctions
ORDER BY Number;

ROW_NUMBER() function is great when we want to read data step
by step, by reading N consecutive rows each time. Example on the
right would help us read our data by fetching 3 rows each time.  
WITH Tab1 AS
( SELECT Part, Number,   
   ROW_NUMBER() OVER (ORDER BY Number) AS RowNum   FROM rankFunctions )
SELECT * FROM Tab1 WHERE RowNum BETWEEN 4 AND 6;
   
Process of reading N by N rows is called pagination. We are reading one page at a time.

Partitioning would just mean that we are restarting the sequence.

SELECT part, number,  
    ROW_NUMBER() OVER(PARTITION BY Part ORDER BY Number) AS rowNum,  
    RANK() OVER (PARTITION BY Part ORDER BY Number) AS Rank,  
    DENSE_RANK() OVER(PARTITION BY Part ORDER BY Number) AS dense_Rank
FROM rankFunctions ORDER BY Part, Number;  

Relative Ranking Functions

Relative ranking functions will show us rank of a row expressed in normalized way, with numbers between 0 and 1. There are two relative ranking functions. Those are CUME_DIST() and PERCENT_RANK().

CUME_DIST() function

CUME_DIST() is short for Cumulative Distribution.
This function will can answer questions like:
What percentage of screws (image on the right) is equal or smaller than 9 cm?

The answer would be 4/6, that is 66%.

This function is called cumulative because it shows a cumulative probability.

This would be the result for our sample table.
SELECT Number,   
   CUME_DIST() OVER(ORDER BY Number) AS cumeRankNum
FROM rankFunctions
ORDER BY Number;  

PERCENT_RANK() function

For PERCENT_RANK() function, we will start with an example. We can see that RANK() of Number will return numbers 1,1,3,4,5,6,7,7. PERCENT_RANK() function is similar, just the values are normalized, so it will return numbers from 0 to 1.

SELECT Number,    
   RANK() OVER (ORDER BY Number) AS rankNum,   
   PERCENT_RANK() OVER(ORDER BY Number) AS percRankNum
FROM rankFunctions
ORDER BY Number;  

PERCENT_RANK() will calculate it's values with this formula:

  ( rank – 1 )
--------------------   
( total_rows – 1 )  


PERCENT_RANK() will exclude current row from the calculation, like it doesn't exist.  

NTILE(n) Function

We have 6 screws.
We want to divide them into 2 groups.
This is the most natural way to divide them:
This is if we want to divide them into 3 groups:  

Such separation of elements into groups can be done in SQL with NTILE(n) function. We will use NTILE(2) to divide Number column into 2 groups and NTILE(3) to divide Number columns into 3 groups.

SELECT Number,    
   NTILE(2) OVER(ORDER BY Number) AS groupNum
FROM rankFunctions
ORDER BY Number;  
SELECT Number,    
   NTILE(3) OVER(ORDER BY Number) AS groupNum
FROM rankFunctions
ORDER BY Number;  

We can notice that nulls would be considered as the smallest values. We can also notice that on the first image we have 2 groups with 4 Numbers. That is perfect. But, on the second image we have three groups with 3,3,2 Numbers. Groups are not of equal size. In that case we would start with 2,2,2 groups, of equal size, but the first few groups would get an extra Number. That would make things almost perfect.  

Final Conclusion

General rules for ranking functions are this:

1) All null values are treated as the same value.
2) It is important to use "ORDER BY" inside of WINDOW function, without it result would be meaningless.
3) None of the ranking functions is taking an argument, with the exception of NTILE(n) function.
4) None of these functions can work with frames, but they can use partitioning.
5) If values for several rows are the same, then the values returned by ranking functions will be the same for those rows. In our sample table, both of two rows with Number 10 will always have the same rank, no matter what ranking function we are using (except ROW_NUMBER() ).

0230 Aggregate Window Functions

We can get a list of all window functions with this system query:

SELECT DISTINCT name FROM sys.functions
WHERE type = 6 ORDER BY name;

This query will return 26 functions, that we can divide into 4 groups:

Aggregate functionsStatistical aggregate functionsRanking and distribution functionsValue functions
avg, count, group_concat, listagg, max, min, prod, sumcorr, covar_pop, covar_samp, stddev_pop, stddev_samp, var_pop, var_sampcume_dist, dense_rank, percent_rank, rank, row_number, ntilefirst_value, lag, last_value, lead, nth_value

Sample table

CREATE TABLE aggWindow ( Part CHAR(5), Number Integer );

INSERT INTO aggWindow ( Part, Number )
    VALUES ( 'Part1', 1 ), ( 'Part1', 3 ), ( 'Part1', 6 )
         , ( 'Part2', 8 ), ( 'Part2', 8 ), ( 'Part2', NULL );

Aggregate functions

Aggregate functions support partitioning, ordering and framing. Below we can see example of this.

SELECT Part, Number, SUM( Number) OVER
       ( PARTITION BY Part ORDER BY Part DESC, Number NULLS LAST        
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  ) As Col1
FROM aggWindow
ORDER BY Part, Number DESC;

We can use WINDOW definition to show all of the classical aggregation functions. Our window is defined as (), so we will aggregate  the whole column.

SELECT Part, Number,     
       SUM( Number ) OVER W AS SumCol,     

              MAX( Number ) OVER W AS MaxCol,   
              MIN( Number ) OVER W AS MinCol,   
              AVG( Number ) OVER W AS AvgCol,   
              PROD( Number ) OVER W AS ProdCol
FROM aggWindow WINDOW W AS ( )
ORDER BY Number;

We can use listagg and group_concat functions to concatenate text from our column.

SELECT      
      listagg( Part, ';' ) OVER W AS listAggCol,     
      group_concat( Part, '-' ) OVER W AS groupConcatCol
FROM aggWindow
WINDOW W AS ( );

Running  Aggregations

Aggregate functions are used to calculate cumulative values. In this case we create our frame relative to current row and we use ORDER BY.

SELECT Number, SUM( Number ) OVER   
      ( ORDER BY Number ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  ) As Col1 FROM aggWindow;

Moving Aggregations

We can also use aggregate functions for calculation of moving  aggregations.

SELECT Number, AVG( Number ) OVER
   ( ORDER BY Number ROWS BETWEEN 1 PRECEDING AND CURRENT ROW  ) As Col1 FROM aggWindow;

Usage of window functions in ORDER BY

Window functions can be used only after all data from our query is created. That means that Window functions can be used only in SELECT and ORDER BY clauses. Here is the usage of Window function in ORDER BY.

SELECT Part, Number
FROM aggWindow
ORDER BY COUNT(Number) OVER (PARTITION BY Part);

Combining WINDOW functions and GROUPBY clause

If we combine WINDOW functions and GROUP BY clause, there is a difference between what these functions can see. GROUP BY calculation can only see one group, but it can see all the detail rows inside of that group. WINDOW function can see all the groups, but it can not see detail rows.

SELECT Part,   
      SUM( Number ) AS Total
FROM aggWindow
GROUP BY Part
ORDER BY Part;
GROUP BY will sum detail rows for each group separately. It can not combine values from different groups.SELECT Part, SUM( SUM( Number ) ) OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Total
FROM aggWindow
GROUP BY Part
ORDER BY Part;
WINDOW function will see only those rows that are returned after GROUP BY finished its calculation. WINDOW function can see only totals for the groups, and not the detail rows. WINDOW function can work with those totals, from all of the groups, at the same time.
It seems that GROUP BY and WINDOW functions are of equal power. But that is not the truth. WINDOW functions are more powerful. We can preserve detail rows in the original columns, and we can create new columns with aggregated values. This allow us to compare detail and aggregated values. This is not possible with GROUP BY.  Look, we now have detail values and aggregated values in the same row. We can now easily calculate share of details in the total.  

SELECT Part, Number, SUM( Number )   
OVER ( PARTITION BY Part ) AS Total
FROM aggWindow
ORDER BY Part, Number;
SELECT Part, Number, SUM( Number )   
      OVER ( PARTITION BY Part ) AS Total,   
      100 * Number / SUM( Number )   
      OVER ( PARTITION BY Part ) AS "Share%"
FROM aggWindow
ORDER BY Part, Number;
The last column will show us share as a percentage.

Statistical Aggregate Functions

Statistical aggregate functions can be used in the same way as the classical aggregate functions.

In one of the earlier posts, we saw how functions CORR, COVAR_POP and COVAR_SAMP return wrong result. Contrary to that, if these functions are used inside of a WINDOW function, their results will be correct.


SELECT Part, Number,     
      CORR( Number, Number - 1 ) OVER W AS CorrCol,     
      COVAR_POP( Number, Number - 1 ) OVER W AS CovarPopCol,   
      COVAR_SAMP( Number, Number - 1 ) OVER W AS CovarSampCol,
      STDDEV_POP( Number ) OVER W AS StdDevPopCol,   
      STDDEV_SAMP( Number ) OVER W AS StdDevSampCol,   
      VAR_POP( Number ) OVER W AS VarPopCol,   
      VAR_SAMP( Number ) OVER W AS VarSampCol
FROM aggWindow
WINDOW W AS ( )
ORDER BY Number;