0240 Ranking Window Functions in MonetDB

Sample Table

This will be our sample table:

CREATE TABLE rankFunctions ( Part CHAR(5), Number Integer );  

INSERT INTO rankFunctions ( Part, Number )     
    VALUES ( 'Part1', 9 ), ( 'Part1', 10 ), ( 'Part1', null ), ( 'Part1', 10 ),
                  ( 'Part1', null ), ( 'Part1', 8 ),( 'Part1', 5 ), ( 'Part2', 7 );

Ranking Functions

For each row in Excel we know its row number. On the image we can see that in Excel, in front of column "A", we have a "column" with row numbers.

We can create the same column, with row numbers, in MonetDB, by using ROW_NUMBER() window function. There are two more similar functions in MonetDB, and those are RANK() and DENSE_RANK().

Sometimes, in Olympic games, two sport players can share gold medal. In that case, there is no silver medal. Next best player will get bronze medal. This is called "olympic ranking". In MonetDB we can create such ranking with RANK() function.

For apples we wouldn't apply Olympic ranking. Instead of that we would place apples of the similar quality in consecutive classes, so we would have apples of first, second and third class. This can be achieved with DENSE_RANK() function. As it's name implies, DENSE_RANK() doesn't have gaps in ranking, so there will be always someone who will get silver medal.

With ROW_NUMBER(), RANK() and DENSE_RANK(), we can not use frames. These functions are not returning one value for each row, and because of that frames are not applicable. These functions always work with the whole partitions.



This is an example for all of the three functions:

SELECT part, number,  
   ROW_NUMBER() OVER(ORDER BY Number) AS rowNum,  
   RANK() OVER (ORDER BY Number) AS Rank,  
   DENSE_RANK() OVER(ORDER BY Number) AS dense_Rank
FROM rankFunctions

ORDER BY Number;
From this example we can conlude:
1) All null values are treated as the
same value.
2) It is important to use "ORDER BY"
inside of WINDOW function, without
it result would be meaningless.
3) These three functions are not
taking any arguments.

This is what would be the result without ORDER BY.
RANK and DENSE_RANK functions would return one's,
so they are not really working without ORDER BY.

ROW_NUMBER() function can be still be useful
without ORDER BY. It can be used as a column
that would deduplicate rows that are otherwise indistinguishable.
SELECT part, number,  
   ROW_NUMBER() OVER() AS rowNum,  
   RANK() OVER () AS Rank,      

      DENSE_RANK() OVER() AS dense_Rank
FROM rankFunctions
ORDER BY Number;

ROW_NUMBER() function is great when we want to read data step
by step, by reading N consecutive rows each time. Example on the
right would help us read our data by fetching 3 rows each time.  
WITH Tab1 AS
( SELECT Part, Number,   
   ROW_NUMBER() OVER (ORDER BY Number) AS RowNum   FROM rankFunctions )
SELECT * FROM Tab1 WHERE RowNum BETWEEN 4 AND 6;
   
Process of reading N by N rows is called pagination. We are reading one page at a time.

Partitioning would just mean that we are restarting the sequence.

SELECT part, number,  
    ROW_NUMBER() OVER(PARTITION BY Part ORDER BY Number) AS rowNum,  
    RANK() OVER (PARTITION BY Part ORDER BY Number) AS Rank,  
    DENSE_RANK() OVER(PARTITION BY Part ORDER BY Number) AS dense_Rank
FROM rankFunctions ORDER BY Part, Number;  

Relative Ranking Functions

Relative ranking functions will show us rank of a row expressed in normalized way, with numbers between 0 and 1. There are two relative ranking functions. Those are CUME_DIST() and PERCENT_RANK().

CUME_DIST() function

CUME_DIST() is short for Cumulative Distribution.
This function will can answer questions like:
What percentage of screws (image on the right) is equal or smaller than 9 cm?

The answer would be 4/6, that is 66%.

This function is called cumulative because it shows a cumulative probability.

This would be the result for our sample table.
SELECT Number,   
   CUME_DIST() OVER(ORDER BY Number) AS cumeRankNum
FROM rankFunctions
ORDER BY Number;  

PERCENT_RANK() function

For PERCENT_RANK() function, we will start with an example. We can see that RANK() of Number will return numbers 1,1,3,4,5,6,7,7. PERCENT_RANK() function is similar, just the values are normalized, so it will return numbers from 0 to 1.

SELECT Number,    
   RANK() OVER (ORDER BY Number) AS rankNum,   
   PERCENT_RANK() OVER(ORDER BY Number) AS percRankNum
FROM rankFunctions
ORDER BY Number;  

PERCENT_RANK() will calculate it's values with this formula:

  ( rank – 1 )
--------------------   
( total_rows – 1 )  


PERCENT_RANK() will exclude current row from the calculation, like it doesn't exist.  

NTILE(n) Function

We have 6 screws.
We want to divide them into 2 groups.
This is the most natural way to divide them:
This is if we want to divide them into 3 groups:  

Such separation of elements into groups can be done in SQL with NTILE(n) function. We will use NTILE(2) to divide Number column into 2 groups and NTILE(3) to divide Number columns into 3 groups.

SELECT Number,    
   NTILE(2) OVER(ORDER BY Number) AS groupNum
FROM rankFunctions
ORDER BY Number;  
SELECT Number,    
   NTILE(3) OVER(ORDER BY Number) AS groupNum
FROM rankFunctions
ORDER BY Number;  

We can notice that nulls would be considered as the smallest values. We can also notice that on the first image we have 2 groups with 4 Numbers. That is perfect. But, on the second image we have three groups with 3,3,2 Numbers. Groups are not of equal size. In that case we would start with 2,2,2 groups, of equal size, but the first few groups would get an extra Number. That would make things almost perfect.  

Final Conclusion

General rules for ranking functions are this:

1) All null values are treated as the same value.
2) It is important to use "ORDER BY" inside of WINDOW function, without it result would be meaningless.
3) None of the ranking functions is taking an argument, with the exception of NTILE(n) function.
4) None of these functions can work with frames, but they can use partitioning.
5) If values for several rows are the same, then the values returned by ranking functions will be the same for those rows. In our sample table, both of two rows with Number 10 will always have the same rank, no matter what ranking function we are using (except ROW_NUMBER() ).

0230 Aggregate Window Functions

We can get a list of all window functions with this system query:

SELECT DISTINCT name FROM sys.functions
WHERE type = 6 ORDER BY name;

This query will return 26 functions, that we can divide into 4 groups:

Aggregate functionsStatistical aggregate functionsRanking and distribution functionsValue functions
avg, count, group_concat, listagg, max, min, prod, sumcorr, covar_pop, covar_samp, stddev_pop, stddev_samp, var_pop, var_sampcume_dist, dense_rank, percent_rank, rank, row_number, ntilefirst_value, lag, last_value, lead, nth_value

Sample table

CREATE TABLE aggWindow ( Part CHAR(5), Number Integer );

INSERT INTO aggWindow ( Part, Number )
    VALUES ( 'Part1', 1 ), ( 'Part1', 3 ), ( 'Part1', 6 )
         , ( 'Part2', 8 ), ( 'Part2', 8 ), ( 'Part2', NULL );

Aggregate functions

Aggregate functions support partitioning, ordering and framing. Below we can see example of this.

SELECT Part, Number, SUM( Number) OVER
       ( PARTITION BY Part ORDER BY Part DESC, Number NULLS LAST        
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  ) As Col1
FROM aggWindow
ORDER BY Part, Number DESC;

We can use WINDOW definition to show all of the classical aggregation functions. Our window is defined as (), so we will aggregate  the whole column.

SELECT Part, Number,     
       SUM( Number ) OVER W AS SumCol,     

              MAX( Number ) OVER W AS MaxCol,   
              MIN( Number ) OVER W AS MinCol,   
              AVG( Number ) OVER W AS AvgCol,   
              PROD( Number ) OVER W AS ProdCol
FROM aggWindow WINDOW W AS ( )
ORDER BY Number;

We can use listagg and group_concat functions to concatenate text from our column.

SELECT      
      listagg( Part, ';' ) OVER W AS listAggCol,     
      group_concat( Part, '-' ) OVER W AS groupConcatCol
FROM aggWindow
WINDOW W AS ( );

Running  Aggregations

Aggregate functions are used to calculate cumulative values. In this case we create our frame relative to current row and we use ORDER BY.

SELECT Number, SUM( Number ) OVER   
      ( ORDER BY Number ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  ) As Col1 FROM aggWindow;

Moving Aggregations

We can also use aggregate functions for calculation of moving  aggregations.

SELECT Number, AVG( Number ) OVER
   ( ORDER BY Number ROWS BETWEEN 1 PRECEDING AND CURRENT ROW  ) As Col1 FROM aggWindow;

Usage of window functions in ORDER BY

Window functions can be used only after all data from our query is created. That means that Window functions can be used only in SELECT and ORDER BY clauses. Here is the usage of Window function in ORDER BY.

SELECT Part, Number
FROM aggWindow
ORDER BY COUNT(Number) OVER (PARTITION BY Part);

Combining WINDOW functions and GROUPBY clause

If we combine WINDOW functions and GROUP BY clause, there is a difference between what these functions can see. GROUP BY calculation can only see one group, but it can see all the detail rows inside of that group. WINDOW function can see all the groups, but it can not see detail rows.

SELECT Part,   
      SUM( Number ) AS Total
FROM aggWindow
GROUP BY Part
ORDER BY Part;
GROUP BY will sum detail rows for each group separately. It can not combine values from different groups.SELECT Part, SUM( SUM( Number ) ) OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Total
FROM aggWindow
GROUP BY Part
ORDER BY Part;
WINDOW function will see only those rows that are returned after GROUP BY finished its calculation. WINDOW function can see only totals for the groups, and not the detail rows. WINDOW function can work with those totals, from all of the groups, at the same time.
It seems that GROUP BY and WINDOW functions are of equal power. But that is not the truth. WINDOW functions are more powerful. We can preserve detail rows in the original columns, and we can create new columns with aggregated values. This allow us to compare detail and aggregated values. This is not possible with GROUP BY.  Look, we now have detail values and aggregated values in the same row. We can now easily calculate share of details in the total.  

SELECT Part, Number, SUM( Number )   
OVER ( PARTITION BY Part ) AS Total
FROM aggWindow
ORDER BY Part, Number;
SELECT Part, Number, SUM( Number )   
      OVER ( PARTITION BY Part ) AS Total,   
      100 * Number / SUM( Number )   
      OVER ( PARTITION BY Part ) AS "Share%"
FROM aggWindow
ORDER BY Part, Number;
The last column will show us share as a percentage.

Statistical Aggregate Functions

Statistical aggregate functions can be used in the same way as the classical aggregate functions.

In one of the earlier posts, we saw how functions CORR, COVAR_POP and COVAR_SAMP return wrong result. Contrary to that, if these functions are used inside of a WINDOW function, their results will be correct.


SELECT Part, Number,     
      CORR( Number, Number - 1 ) OVER W AS CorrCol,     
      COVAR_POP( Number, Number - 1 ) OVER W AS CovarPopCol,   
      COVAR_SAMP( Number, Number - 1 ) OVER W AS CovarSampCol,
      STDDEV_POP( Number ) OVER W AS StdDevPopCol,   
      STDDEV_SAMP( Number ) OVER W AS StdDevSampCol,   
      VAR_POP( Number ) OVER W AS VarPopCol,   
      VAR_SAMP( Number ) OVER W AS VarSampCol
FROM aggWindow
WINDOW W AS ( )
ORDER BY Number;

0220 Window Functions Syntax

Before reading this blog post, please read the Window Functions theory.

Function(Expressions)
FILTER ( WHERE Predicates )
OVER (  [PARTITION BY Expressions]        
        [ORDER BY Expressions [NULLS FIRST|LAST]]
        [Frame Type BETWEEN Frame Start AND Frame End]    
        [EXCLUDE Frame Exclusion]
)
This is the complete syntax of a Window function. MonetDB
doesn't support FILTER clause, so we will skip that clause.

For EXCLUDE clause, only "EXCLUDE NO OTHERS" is
implemented. That means that this clause is currently useless, so we will skip it, too.

Sample Table

CREATE TABLE aggWindow ( Part CHAR(5), Number Integer );

INSERT INTO aggWindow ( Part, Number )
    VALUES ( 'Part1', 1 ), ( 'Part1', 1 ), ( 'Part1', 3 ), ( 'Part1', 3 )        
         , ( 'Part2', 6 ), ( 'Part2', 6 ), ( 'Part2', 8 ), ( 'Part2', 8 )
         , ( 'Part2', NULL );

Simple Aggregation

If the parentheses after OVER keyword are empty, then we will aggregate the whole Number column. Aggregate functions, like SUM, AVERAGE, MAX…, will ignore NULLs.

1+1+3+3+6+6+8+8 = 36  

SELECT Part, Number,
SUM( Number ) OVER ( ) AS Col1
FROM aggWindow;

Usage of Partitions

We can partition our table before aggregation. Each partition will be separately aggregated.  

1+1+3+3 = 8      6+6+8+8 = 28  

SELECT Part, Number,
SUM( Number ) OVER ( PARTITION BY Part ) AS Col1
FROM aggWindow;

ORDER BY and Frames

ORDER BY is always accompanied with the frame definition. If the frame definition is not provided, then the default frame is used. The default frame is:

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  

SELECT Part, Number,
SUM( Number ) OVER ( ORDER BY Number ) AS Col1
FROM aggWindow;

We should avoid using default values and we should always provide explicit frame definitions.


If we don't provide ORDER BY, there is no knowing how frames will be created. It is important to provide ORDER BY to avoid randomness. Notice in the image that we have a meaningless running total, because there is no ordering of the Number column.

SELECT Part, Number, SUM( Number )
OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Col1
FROM aggWindow;

If we sort Number column in the final data set, meaningless running total will become more obvious.

SELECT Part, Number, SUM( Number )
OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Col1
FROM aggWindow ORDER BY Number;
 

Notice that ORDER BY clause can exist on two places in the statement. One is used to define frame, and the classic one is used to sort final data set.
SELECT Part, Number, SUM( Number )
OVER ( ORDER BY Number ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Col1
FROM aggWindow ORDER BY Number;

Within the Window, the NULL position can be controlled independently of the ORDER BY clause. We can use NULLS LAST or NULLS FIRST to explicitly define position of NULLs.

SELECT Part, Number, SUM( Number ) OVER
( PARTITION BY Part ORDER BY Number NULLS LAST

  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Col1 FROM aggWindow ORDER BY Number;
In the image, our Null is appearing in the first row because of the classic ORDER BY clause, but it is paired with the biggest value in the "Col1" for Partition 2 (28). NULL is now part of the last frame in the last Partition.

This is because of the clause NULLS LAST, which changed position of the NULL row, from the first, to the last, in the window definition.

Examples for the Rows, Ranges and Groups

In this example, frame is defined by the current and two previous rows.

SELECT Part, Number,
SUM( Number ) OVER
( ORDER BY Number  
  ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) AS Col1

FROM aggWindow ORDER BY Number;

With Ranges, current row comprise all the rows with the same value. Current row is a set of records, and not only one row.

SELECT Part, Number,
SUM( Number ) OVER
( ORDER BY Number  
  RANGE BETWEEN CURRENT ROW AND 3 FOLLOWING ) AS Col1
FROM aggWindow ORDER BY Number;
This example will use all the rows with the current value X and will calculate range with limits [X,X+3].

For X = 6, limits are [6,6+3]. Numbers 6 and 8 are both inside of this range.

Each frame encompass current group, one previous group, and all the latter groups.

SELECT Part, Number,
SUM( Number ) OVER
( ORDER BY Number
  GROUPS BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING ) AS Col1
FROM aggWindow ORDER BY Number;

Window Chaining ( WINDOW clause )

Window functions are verbose. If we want to use them several times in our statement, then our statement will become really long.

SELECT Part, Number,   
   SUM( Number ) OVER ( PARTITION BY Part ORDER BY Number
      ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) AS Col1,   
   SUM( Number ) OVER ( ORDER BY Number DESC          
      GROUPS BETWEEN 1 PRECEDING AND CURRENT ROW ) AS Col2
FROM aggWindow ORDER BY Number;

The only workaround is available if we are using the same window for several functions. We can then define our window once, with WINDOW clause, and then reference it several times.

SELECT Part, Number,   
   SUM( Number ) OVER W AS SumCol,   
   MAX( Number ) OVER W AS MaxCol
FROM aggWindow
WINDOW W AS ( PARTITION BY Part ORDER BY Number          
              ROWS BETWEEN 1 PRECEDING AND CURRENT ROW )
ORDER BY Number;

Default frames

We already saw that we should avoid using default frames. There are two more abbreviations that will assume default frames.

Abbreviated syntaxFull definition of the default frame
{ UNBOUNDED | X } PRECEDINGBETWEEN { UNBOUNDED | X } PRECEDING AND CURRENT ROW
{ UNBOUNDED | X } FOLLOWINGBETWEEN CURRENT ROW AND { UNBOUNDED | X } FOLLOWING

SELECT Part, Number,
SUM( Number ) OVER
( ORDER BY Number ROWS 1 PRECEDING) AS Col1

FROM aggWindow ORDER BY Number;

–The same is for RANGE or GROUPS.
Default frame is:
SELECT Part, Number,
SUM( Number ) OVER ( ORDER BY Number ROWS BETWEEN
1 PRECEDING AND CURRENT ROW
) AS Col1
FROM aggWindow ORDER BY Number;

–The same is for RANGE or GROUPS.

Window Functions and GROUP BY

We can group our table with GROUP BY. After we do this, window function will only see those grouped rows. On the image to the left, window function will only see 2 rows. Detail rows will not be any more available to window function.

The question is, what syntax to use to create running total in this grouped table.

Bellow we can see correct syntax. Columns in the grouped table are referred as Part and COUNT( Number ). Our Window function is based on those columns. That means that our window function will be SUM( COUNT( Number ) ).

SELECT Part, COUNT( Number ) AS CountCol, SUM( COUNT( Number ) )   
   OVER ( ORDER BY Part ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)   
   AS RunningTotal

FROM aggWindow
GROUP BY Part
ORDER BY Part;
 

Limitations of Window Functions

Range and ORDER BY

Range frames can only use one column inside of ORDER BY clause.  

SELECT Part, Number, SUM( Number ) OVER ( ORDER BY Part DESC, Number  
   RANGE BETWEEN CURRENT ROW AND 3 FOLLOWING ) AS Col1
FROM aggWindow ORDER BY Number;

DISTINCT and Window functions

We can not use DISTINCT keyword inside of Window function.  

SELECT Part, Number
COUNT( DISTINCT Number ) OVER ( ) AS Col1
FROM aggWindow
;

0210 Aggregate Functions and Logical Functions

Aggregation Functions

Sample Table for Aggregation Functions

CREATE TABLE aggtable( Number INT, Word VARCHAR(8), intervalMonth INTERVAL MONTH );

INSERT INTO aggtable( Number, Word, intervalMonth )

VALUES  ( 2, 'two', INTERVAL '2' month ), ( 2, 'two', INTERVAL '2' month )
            , ( 3, 'three', INTERVAL '3' month ), ( 4,'four', INTERVAL '6' month )
            , ( NULL, NULL, NULL );

Sys.Functions System Table

We can get a list of aggregate functions from the system table sys.functions. Aggregate functions are of the type 3.  

SELECT DISTINCT name, mod FROM sys.functions WHERE type = 3 ORDER BY name;   

We can divide aggregate functions into three groups:

Arithmetic functionsConcatenation functionsStatistic functions
avg,
count, count_no_nil,
max, min,
prod,
sum
group_concat,
listagg,
tojsonarray
corr,
covar_pop, covar_samp,
median, median_avg,
quantile, quantile_avg,
stdev_pop, stdev_samp,
var_pop, var_samp

Arithmetic Functions

SQLResultCalculationComment
SELECT AVG( Number ) FROM aggTable;2.75(2+2+3+4)/4 = 2.75NULL is ignored.
SELECT COUNT( * ) FROM aggTable;5Count the rows of the table. 
SELECT COUNT( Word ) FROM aggTable;4Count words, without NULL. 
SELECT COUNT_NO_NIL( Word ) FROM aggTable;Error Doesn't work.
SELECT MAX( Word ) FROM aggTable;'two'Last value.Words are ordered alphabetically, A-Z.
SELECT MIN( Number ) FROM aggTable;2First value.Numbers are ordered numerically.
SELECT PROD( Number ) FROM aggTable;482*2*3*4=48 
SELECT SUM( Number ) FROM aggTable;112+2+3+4=11 
SELECT SUM( intervalMonth ) FROM aggTable;132+2+3+6=13Also work with seconds. Result is of interval type.

We can use DISTINCT keyword to exclude duplicates. In our sample table, only number 3 is duplicate. Calculations below will be done like that duplicate value doesn't exist

SQLResultCalculation
SELECT AVG( DISTINCT Number ) FROM aggTable;3(2+  3  +4)/3 = 3
SELECT COUNT( DISTINCT Word ) FROM aggTable;3Count word, no NULLs, no duplicates.
SELECT PROD( DISTINCT Number ) FROM aggTable;242*   3  *4 = 24
SELECT SUM( DISTINCT Number ) FROM aggTable;92+   3  +4 = 9
SELECT SUM( DISTINCT  intervalMonth ) FROM aggTable;92+   3  +4 = 9

Concatenation Functions

Concatenation is a way to aggregate text. Instead of having text occupying N rows, where on each row we have only one phrase, we can aggregate them all into one row.
Result will be comma separated list of those phrases, although we can choose what delimiter will be used instead of the comma.

If we want to remove duplicates, we have to use DISTINCT keyword before the name of a column.

It is not possible to control sort order of the phrases in the result.

SQLResultComment
SELECT LISTAGG( Word, '|' ) FROM aggTable;one|two|three|three|fourDefault delimiter is a comma. Returns VARCHAR.
SELECT SYS.GROUP_CONCAT( Word, ';' ) FROM aggTable;one;two;three;three;fourDefault delimiter is a comma. Returns CLOB.
SELECT JSON.ToJsonArray( Word ) FROM aggTable;[ "one", "two", "three", "three", "four" ]Result is JSON list.
SELECT JSON.ToJsonArray( Number ) FROM aggTable;[ "1", "2", "3", "3", "4" ]Works with numbers, too.

Statistical Functions

Numbers, from Number column, can be divided into smaller and larger numbers. Half of the numbers will be smaller and the other half will be larger numbers. The number on the border between the smaller and larger numbers is the median.

SQLResultResult if we add number 5 to our column.
SELECT SYS.MEDIAN( Number ) FROM aggTable;21,2(2),33,44         =>      221,22,3(3),44,55         =>     3
SELECT SYS.MEDIAN_AVG( Number ) FROM aggTable;21,2(2),3(3),44         =>       (2+3)/2 = 2.521,22,3(3),44,55         =>     3

Median is a special case of a quantile. Median is a 50% quantile. But we can differently divide our numbers. We can divide them 60% vs 40%, so that 60% numbers are on the smaller side, and 40% is on the bigger side. Number between them would be called 60% quantile. In our example below, "60 % quantile" is 2.8, which means that 60% of numbers is below 2.8. This would be more obvious if we had more numbers in our column.

SQLResultCalculation
SELECT SYS.QUANTILE_AVG( Number, 0.6 ) FROM aggTable;2.8( 4 – 1 ) * 0.6 + 1, where 4 is the count of our numbers.
SELECT SYS.QUANTILE( Number, 0.6 ) FROM aggTable;3This is just value from above, rounded to integer.

Variance and standard deviation are calculated differently depending whether our data represent a population or a sample.

SQLResultCalculation 
SELECT SYS.VAR_SAMP( Number ) FROM aggTable;0.917
SELECT SYS.StdDev_SAMP( Number ) FROM aggTable;0.957sqrt( variance ) = sqrt( 0.917 ) = 0.957
SELECT SYS.VAR_POP( Number ) FROM aggTable;0.687
SELECT SYS.StdDev_POP( Number ) FROM aggTable;0.829sqrt( variance ) = sqrt( 0.687 ) = 0.829

Covariance in statistics measures the extent to which two variables vary linearly.  Correlation is just covariance measured in normalized units. Unfortunately, there is a bug in MonetDB, version 11.49.09, and all of these functions will return wrong results.

SQLResult in MonetDBCalculation 
SELECT SYS.COVAR_SAMP( Number, 10 – Number * 1.2 ) FROM aggTable;-1.417 ( not correct, it is -1.1 )
SELECT SYS.COVAR_POP( Number, 10 – Number * 1.2 ) FROM aggTable;-1.0625 ( not correct, it is -0.825 )
SELECT SYS.CORR( Number, 10 – Number * 1.2 ) FROM aggTable;-0.986 ( not correct, it is -1 )

All statistic function will ignore NULLs.

Logical Operators

Binary Operators

SQLResultComment
SELECT True AND True;TRUEReturns TRUE only if both arguments are TRUE.
SELECT True OR False;TRUEReturns TRUE if at least one argument is TRUE.

Unary Operators

SQLResultComment
SELECT NOT True;FALSEWill transform TRUE to FALSE, and FALSE to TRUE. Always the opposite.
SELECT Null IS NULL;TRUEChecks whether something is NULL.
SELECT Null IS NOT NULL;FALSEChecks whether something is NOT NULL.

All other logical operators will return Null if at least one of its arguments is Null.

SQLResult
SELECT NOT Null;NULL
SELECT Null AND True;NULL

Most SQL functions will either return NULL if one of the arguments is NULL, or will ignore rows with NULL values.

Logical Functions

Operators AND, OR and NOT have alternative syntax where they work like functions.  XOR can not work like operator, only like a function.

SQLResultComment
SELECT "xor"(True, False);TRUEReturns TRUE only when the first argument is the opposite of the second argument (Arg1 = NOT Arg2).
SELECT "and"(True, False);FALSEReturns TRUE only if both arguments are TRUE.
SELECT "or"(False, False);FALSEReturns TRUE if at least one argument is TRUE.
SELECT "not"(False);TRUEWill transform TRUE to FALSE, and FALSE to TRUE. Always the opposite.

0200 MonetDB: Window Functions Theory

Imagine people scattered through some area, looking through binoculars. None of them will see the whole scenery, they will only see part of it. It is like each of them is looking through the different window.

Now, imagine that in each row, of a database table, there is a data scientist looking through binoculars. Each data scientist can only see some of the rows from that table. Each data scientist has its own window.

What would scientist do to represent nature of data he is looking at? He would aggregate them. If each of our scientist decide to calculate average of data he is looking at, we would get a table like this one:

In the real database table, with millions of rows, these average values would not be representatives of anything. Our windows are too random. If we can create a rule by which windows are created, then we would have a scientific view of our data. Let's say that each data scientist can only see its own row, and previous two rows. Then we would have a rule. Check out animated image below (left image).

By using this rule for a window creation, we can calculate "moving averages", which are often used in statistics.

We can also define these rules in SQL if we use Window Functions. Window Functions are special, because they can define windows and then apply some aggregation to data in those windows. We can apply aggregations like SUM, AVG, MAX, but we can also use some special aggregation functions.

Window functions are also called Analytic functions, because they give us abilities that are beyond traditional SQL statements. With them we can do things which were previously hard to achieve in SQL, and they are really useful for a deep analysis of our data. Windows are like overlapping samples from our tables. They can reveal us how the nature of our data is changing through time and dimensions.

Terminology

In SQL, window, as explained above, is actually called "frame". The term "window" means something else. We will now discern difference between window, partition and a frame.

Frame is group of records that will be aggregated. Frame is presented with the moving red rectangle in the animation bellow.

Tables in animation below, show how many points each country won on some sport competition.

Sport results presented on the animation above will not be held in a database like three tables, but they will be placed together into one big table.  

That big table is our Window (assuming we are using no filters on that table).
Smaller tables are called Partitions.
Red moving rectangles are Frames.  

Window functions can process partitions separately, the same as they were separate tables.  

How to Define Window?

SELECT employee_id, salary
FROM employees
WHERE department_id = 101  
AND salary > (SELECT AVG(salary)                 FROM employees);
Subquery is not under direct influence of the outer query. In the example, we have a filter
department_id = 101 on the outer query. But subquery is not under the influence of that filter.
Subquery will calculate average salary for all of the employees. Meaning is, that we are looking for employees from department 101, that have bigger salary then the global average.

This is not true for window functions. Window is under influence of the query context. Everything that is used inside FROM, WHERE, GROUP BY and HAVING clauses will define our window. Window functions can only do their magic after the final dataset is defined and unchangeable. That also means that Window functions can only be used in SELECT and ORDER BY clauses.

How to Define Partitions?

Partitions can be defined by the values of one column. All rows that have the same value will be the same partition. On our image, all the rows with letter "A" will create Partition 1. It is also possible to use combination of the values from two or more columns to define partitions. Each unique combination of values will define a partition. On the image below, combination of values A and Q will define Partition 2. We can use expression to calculate values for our column(s). In our example, all the rows, where MOD function returns 1 will belong to Partition 1. Rows that return 0 will belong to Partition 2.

How to Define Frames?

Frames are moving and so, they are always calculated relative to the current row. Two other reference points are the first and the last row in our partition. Position of the frame is always relative to those reference points.

For definition of a frame, we have to define its start row, and its end row. End row has to be after Star row. Below we can see all the ways how to define start and end row.

                    Can only be START ROW
– UNBOUNDED PRECEDING – the first row in the partition                      
                    Can be both START or END ROW
– N PRECEDING – row that is N rows before current row.
– CURRENT ROW – our major reference point.
– M FOLLOWING – row that is M rows after current row.

                     Can only be END ROW
– UNBOUNDED FOLLOWING – the last row in the partition
An example:

Notice that for all of this to make sense, records have to be sorted.

What is a Group?

For window functions, start and end of a frame doesn't have to be a row. Start and end can also be defined with ranges and groups. Ranges and Groups are not individual records, they are sets of records.

Groups are defined similar to Partitions. All rows with the same value will be one group.

On the image to the left, current row is the row 5, but the current group is the Group 3. We are no more looking at 9 records, we are looking at 5 groups.

Our frame will start with one of the groups and will end with one of the groups after.
In this example, our frame will start with the first group in Partition, and will end on the group that is just after the current group.

What is a Range?

Relative positions are important for rows and groups, but not for ranges. With ranges, we are dealing with values in our column.

Let's say that some student took a school test. She scored 85 points on a test and she got a grade "A", because if number of points is between 76-100, then the grade is "A". It is similar with Ranges in window functions. Each frame is defined with a range of values. If a field value belongs to that range, then that record belongs to a frame defined by that range.

So, how we define a range? Really simple. If our current row has a value of X, we will add or subtract some number to that X, and we will get an extreme value of our range.

current value – N         
(syntax: N preceding)
                                          
current value + M        
(syntax: M following)
If, adjacent rows of the current row, have values that are close enough to the current value, then those rows will together make a frame.

Our frame is between [20,30], so all the rows beside the first and the last one, belong to this frame.  

Demonstration

This is a simple example of a window function. This example shows how to calculate cumulative of the qty column. We are not using PARTITION BY clause, so the whole table is one big partition.