0250 Offset Window Functions in MonetDB

Offset functions are window functions that return a value from a specified row. The row is determined as an offset from the reference point. There are two major reference points. The first is the current row, the second is the frame border. Based on this, we distinguish between "row offset functions" and "frame offset functions".

Row Offset Functions

With row offset functions, we move from the current row several steps up or down. Our movement is dependent only on the current row, so we do not use frames with row offset functions. Frames are not applicable in this case.

LAG function will move us up. LEAD function will move us down.

We will create one sample table so that we can observe how these functions work.

CREATE TABLE offsetTable( "Month" INT, Value INT );  

INSERT INTO offsetTable ( "Month", Value ) VALUES  ( 1, 1 ), ( 2, 2 ) , ( 3, null ), ( 5, 5 ), ( 6, 6 ), ( 7, 7 );

LAG function will allow us to get some previous value and to place it in the current row. That would allow us to compare some old and some new value, so we can analyze the change in time.
SELECT "Month", Value, LAG( Value, 2 ) OVER ( ORDER BY "Month" )
FROM offsetTable ORDER BY "Month";

We can see three drawbacks in the image above:

  • Month April doesn't have a value. LAG function doesn't notice that, and will happily move two rows above, but not two months before. For the fifth month we are getting the value from the February which is three months before, not two. That is the consequence of the missing month April. We should avoid using LAG function when we are missing some time points.
  • For months January and February, we have nulls in the new column . Previous values don't exist so we have to be satisfied with nulls.
  • In the third row, the Value column has NULL. Sometimes we want to go over those nulls and not count them as steps we take. There is a way to achieve this in SQL standard, but that part of SQL standard is not implemented in MonetDB, so null values will always be counted as one step. Subclause IGNORE NULLS is not supported.  
--not supported
SELECT "Month", Value, LAG( Value, 2) IGNORE NULLS
OVER (ORDER BY "Month" )
FROM offsetTable ORDER BY "Month";

LAG can work with partitions.

We should always order rows inside partition with ORDER BY. Although LAG function can work without ORDER BY, the results will be unpredictable and meaningless.

LAG Function Arguments

If the second argument is missing, then the default value is 1. In the image, we can see that the rows are shifted by one row.

SELECT "Month", Value, LAG(Value) OVER (ORDER BY "Month") FROM offsetTable ORDER BY "Month";

Only if we provide second argument, we can also provide the third argument. The third argument is the default value. Default value will be used for those rows where previous values don't exist.
SELECT "Month", Value, LAG(Value,2,77) OVER (ORDER BY "Month") FROM offsetTable ORDER BY "Month";

Default value has to be of the proper data type that match other values in the column.

Third argument is optional. If we skip it, the default value would be null.

SELECT "Month", Value, LAG(Value,2) OVER (ORDER BY "Month") FROM offsetTable ORDER BY "Month";

LEAD Function

LEAD function is the same as LAG function, just the opposite. Notice that we can achieve the same result with LEAD function, as with LAG function, if we change ORDER BY from ascending order to descending order.

SELECT "Month", Value, LAG(Value,1) OVER (ORDER BY "Month" ASC)
FROM offsetTable ORDER BY "Month";
SELECT "Month", Value, LEAD(Value,1) OVER (ORDER BY "Month" DESC)
FROM offsetTable ORDER BY "Month";
                  <=== totally the same ===>

This is similar example. Both statements use ASC, but the result is the same. This is because the offset argument can be negative. This is another way we can make the LAG and LEAD functions do the same thing.

SELECT "Month", Value, LAG(Value,2) OVER (ORDER BY "Month" ASC)
FROM offsetTable ORDER BY "Month";
SELECT "Month", Value, LEAD(Value,-2) OVER (ORDER BY "Month" ASC)
FROM offsetTable ORDER BY "Month";
             <=== totally the same ===>

Frame Offset Functions

For frame offset functions everything is relative to frame borders. We have three functions of this type.
FIRST_VALUE will return the first value in the frame.
NTH_VALUE will return the NTH value in the frame, from the start of a frame.
LAST_VALUE will return the last value in the frame.

For frame offset functions, we always need to have frame. If 6 is the month in the current row, then the frame is between 4th and 8th month. Now that we know the frame size and position, it is easy to see that the first value in this frame is 5.

SELECT "Month", Value, FIRST_VALUE(Value) OVER
(ORDER BY "Month" RANGE BETWEEN 2 PRECEDING AND 2 FOLLOWING)
FROM offsetTable ORDER BY "Month";

If we omit frame definition, the default frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW will be used. Be aware of this and always define frame.  

SELECT "Month", Value, LAST_VALUE(Value) OVER (ORDER BY "Month")
FROM offsetTable ORDER BY "Month";

Frame offset functions can work with partitions.

We should always order rows inside frame with ORDER BY. Frame offset functions can work without ORDER BY, but the results will be unpredictable and meaningless.

NTH_VALUE Function

NTH_VALUE function has another argument which tells how many steps to move from the start of a frame. In the example bellow, our frame is the whole partition. Forth number in the "value" column is number 5. That number will be the result of this window function.
SELECT "Month", Value, NTH_VALUE(Value,4)
OVER (ORDER BY "Month" ASC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM offsetTable ORDER BY "Month";

If the second argument is omitted, default value will not be one, but the error will be raised.

The value of that second argument can not be negative.

We can use partitions with NTH_VALUE.
You should always provide frame definition, and you should have ORDER BY for your frame.

Difference From SQL Standard

IGNORE NULLS is not supported in MonetDB for the functions FIRST_VALUE, LAST_VALUE and NTH_VALUE.

With IGNORE NULLS we would be able to not count nulls, but this is not supported in MonetDB.
SELECT "Month", Value, NTH_VALUE(Value,3) IGNORE NULLS   --not supported
OVER (ORDER BY "Month" ASC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

FROM offsetTable ORDER BY "Month";

NTH_VALUE can not use subclause FROM LAST. This would allow to count steps from the end of the frame, and not from the start. But this is not supported in MonetDB.

SELECT "Month", Value, NTH_VALUE(Value,2) FROM LAST   --not supported
OVER (ORDER BY "Month" ASC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM offsetTable ORDER BY "Month";

We actually don't need "FROM LAST" subclause. It is enough to change ORDER BY to descending, and we would get the same result as with "FROM LAST".
SELECT "Month", Value, NTH_VALUE(Value,2)
OVER (ORDER BY "Month" DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM offsetTable ORDER BY "Month";

Leave a Comment

Your email address will not be published. Required fields are marked *