0220 Window Functions Syntax

Before reading this blog post, please read the Window Functions theory.

Function(Expressions)
FILTER ( WHERE Predicates )
OVER (  [PARTITION BY Expressions]        
        [ORDER BY Expressions [NULLS FIRST|LAST]]
        [Frame Type BETWEEN Frame Start AND Frame End]    
        [EXCLUDE Frame Exclusion]
)
This is the complete syntax of a Window function. MonetDB
doesn't support FILTER clause, so we will skip that clause.

For EXCLUDE clause, only "EXCLUDE NO OTHERS" is
implemented. That means that this clause is currently useless, so we will skip it, too.

Sample Table

CREATE TABLE aggWindow ( Part CHAR(5), Number Integer );

INSERT INTO aggWindow ( Part, Number )
    VALUES ( 'Part1', 1 ), ( 'Part1', 1 ), ( 'Part1', 3 ), ( 'Part1', 3 )        
         , ( 'Part2', 6 ), ( 'Part2', 6 ), ( 'Part2', 8 ), ( 'Part2', 8 )
         , ( 'Part2', NULL );

Simple Aggregation

If the parentheses after OVER keyword are empty, then we will aggregate the whole Number column. Aggregate functions, like SUM, AVERAGE, MAX…, will ignore NULLs.

1+1+3+3+6+6+8+8 = 36  

SELECT Part, Number,
SUM( Number ) OVER ( ) AS Col1
FROM aggWindow;

Usage of Partitions

We can partition our table before aggregation. Each partition will be separately aggregated.  

1+1+3+3 = 8      6+6+8+8 = 28  

SELECT Part, Number,
SUM( Number ) OVER ( PARTITION BY Part ) AS Col1
FROM aggWindow;

ORDER BY and Frames

ORDER BY is always accompanied with the frame definition. If the frame definition is not provided, then the default frame is used. The default frame is:

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  

SELECT Part, Number,
SUM( Number ) OVER ( ORDER BY Number ) AS Col1
FROM aggWindow;

We should avoid using default values and we should always provide explicit frame definitions.


If we don't provide ORDER BY, there is no knowing how frames will be created. It is important to provide ORDER BY to avoid randomness. Notice in the image that we have a meaningless running total, because there is no ordering of the Number column.

SELECT Part, Number, SUM( Number )
OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Col1
FROM aggWindow;

If we sort Number column in the final data set, meaningless running total will become more obvious.

SELECT Part, Number, SUM( Number )
OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Col1
FROM aggWindow ORDER BY Number;
 

Notice that ORDER BY clause can exist on two places in the statement. One is used to define frame, and the classic one is used to sort final data set.
SELECT Part, Number, SUM( Number )
OVER ( ORDER BY Number ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Col1
FROM aggWindow ORDER BY Number;

Within the Window, the NULL position can be controlled independently of the ORDER BY clause. We can use NULLS LAST or NULLS FIRST to explicitly define position of NULLs.

SELECT Part, Number, SUM( Number ) OVER
( PARTITION BY Part ORDER BY Number NULLS LAST

  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Col1 FROM aggWindow ORDER BY Number;
In the image, our Null is appearing in the first row because of the classic ORDER BY clause, but it is paired with the biggest value in the "Col1" for Partition 2 (28). NULL is now part of the last frame in the last Partition.

This is because of the clause NULLS LAST, which changed position of the NULL row, from the first, to the last, in the window definition.

Examples for the Rows, Ranges and Groups

In this example, frame is defined by the current and two previous rows.

SELECT Part, Number,
SUM( Number ) OVER
( ORDER BY Number  
  ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) AS Col1

FROM aggWindow ORDER BY Number;

With Ranges, current row comprise all the rows with the same value. Current row is a set of records, and not only one row.

SELECT Part, Number,
SUM( Number ) OVER
( ORDER BY Number  
  RANGE BETWEEN CURRENT ROW AND 3 FOLLOWING ) AS Col1
FROM aggWindow ORDER BY Number;
This example will use all the rows with the current value X and will calculate range with limits [X,X+3].

For X = 6, limits are [6,6+3]. Numbers 6 and 8 are both inside of this range.

Each frame encompass current group, one previous group, and all the latter groups.

SELECT Part, Number,
SUM( Number ) OVER
( ORDER BY Number
  GROUPS BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING ) AS Col1
FROM aggWindow ORDER BY Number;

Window Chaining ( WINDOW clause )

Window functions are verbose. If we want to use them several times in our statement, then our statement will become really long.

SELECT Part, Number,   
   SUM( Number ) OVER ( PARTITION BY Part ORDER BY Number
      ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) AS Col1,   
   SUM( Number ) OVER ( ORDER BY Number DESC          
      GROUPS BETWEEN 1 PRECEDING AND CURRENT ROW ) AS Col2
FROM aggWindow ORDER BY Number;

The only workaround is available if we are using the same window for several functions. We can then define our window once, with WINDOW clause, and then reference it several times.

SELECT Part, Number,   
   SUM( Number ) OVER W AS SumCol,   
   MAX( Number ) OVER W AS MaxCol
FROM aggWindow
WINDOW W AS ( PARTITION BY Part ORDER BY Number          
              ROWS BETWEEN 1 PRECEDING AND CURRENT ROW )
ORDER BY Number;

Default frames

We already saw that we should avoid using default frames. There are two more abbreviations that will assume default frames.

Abbreviated syntaxFull definition of the default frame
{ UNBOUNDED | X } PRECEDINGBETWEEN { UNBOUNDED | X } PRECEDING AND CURRENT ROW
{ UNBOUNDED | X } FOLLOWINGBETWEEN CURRENT ROW AND { UNBOUNDED | X } FOLLOWING

SELECT Part, Number,
SUM( Number ) OVER
( ORDER BY Number ROWS 1 PRECEDING) AS Col1

FROM aggWindow ORDER BY Number;

–The same is for RANGE or GROUPS.
Default frame is:
SELECT Part, Number,
SUM( Number ) OVER ( ORDER BY Number ROWS BETWEEN
1 PRECEDING AND CURRENT ROW
) AS Col1
FROM aggWindow ORDER BY Number;

–The same is for RANGE or GROUPS.

Window Functions and GROUP BY

We can group our table with GROUP BY. After we do this, window function will only see those grouped rows. On the image to the left, window function will only see 2 rows. Detail rows will not be any more available to window function.

The question is, what syntax to use to create running total in this grouped table.

Bellow we can see correct syntax. Columns in the grouped table are referred as Part and COUNT( Number ). Our Window function is based on those columns. That means that our window function will be SUM( COUNT( Number ) ).

SELECT Part, COUNT( Number ) AS CountCol, SUM( COUNT( Number ) )   
   OVER ( ORDER BY Part ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)   
   AS RunningTotal

FROM aggWindow
GROUP BY Part
ORDER BY Part;
 

Limitations of Window Functions

Range and ORDER BY

Range frames can only use one column inside of ORDER BY clause.  

SELECT Part, Number, SUM( Number ) OVER ( ORDER BY Part DESC, Number  
   RANGE BETWEEN CURRENT ROW AND 3 FOLLOWING ) AS Col1
FROM aggWindow ORDER BY Number;

DISTINCT and Window functions

We can not use DISTINCT keyword inside of Window function.  

SELECT Part, Number
COUNT( DISTINCT Number ) OVER ( ) AS Col1
FROM aggWindow
;

Leave a Comment

Your email address will not be published. Required fields are marked *