## ORDER BY and Frames

Notice that ORDER BY clause can exist on two places in the statement. One is used to define frame, and the classic one is used to sort final data set.
`SELECT Part, Number, SUM( Number )`
`OVER ( ORDER BY Number ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Col1FROM aggWindow ORDER BY Number;`

## Window Chaining ( WINDOW clause )

Window functions are verbose. If we want to use them several times in our statement, then our statement will become really long.

## Default frames

We already saw that we should avoid using default frames. There are two more abbreviations that will assume default frames.

## Window Functions and GROUP BY

Bellow we can see correct syntax. Columns in the grouped table are referred as Part and COUNT( Number ). Our Window function is based on those columns. That means that our window function will be SUM( COUNT( Number ) ).

# Limitations of Window Functions

## Aggregation Functions

### Sys.Functions System Table

We can get a list of aggregate functions from the system table sys.functions. Aggregate functions are of the type 3.

``SELECT DISTINCT name, mod FROM sys.functions WHERE type = 3 ORDER BY name;   ``

We can divide aggregate functions into three groups:

### Statistical Functions

Numbers, from Number column, can be divided into smaller and larger numbers. Half of the numbers will be smaller and the other half will be larger numbers. The number on the border between the smaller and larger numbers is the median.

Median is a special case of a quantile. Median is a 50% quantile. But we can differently divide our numbers. We can divide them 60% vs 40%, so that 60% numbers are on the smaller side, and 40% is on the bigger side. Number between them would be called 60% quantile. In our example below, "60 % quantile" is 2.8, which means that 60% of numbers is below 2.8. This would be more obvious if we had more numbers in our column.

Variance and standard deviation are calculated differently depending whether our data represent a population or a sample.

Covariance in statistics measures the extent to which two variables vary linearly.  Correlation is just covariance measured in normalized units. Unfortunately, there is a bug in MonetDB, version 11.49.09, and all of these functions will return wrong results.

All statistic function will ignore NULLs.

## Logical Operators

### Unary Operators

All other logical operators will return Null if at least one of its arguments is Null.

Most SQL functions will either return NULL if one of the arguments is NULL, or will ignore rows with NULL values.

### Logical Functions

Operators AND, OR and NOT have alternative syntax where they work like functions.  XOR can not work like operator, only like a function.

## 0200 MonetDB: Window Functions Theory

Imagine people scattered through some area, looking through binoculars. None of them will see the whole scenery, they will only see part of it. It is like each of them is looking through the different window.

Now, imagine that in each row, of a database table, there is a data scientist looking through binoculars. Each data scientist can only see some of the rows from that table. Each data scientist has its own window.

What would scientist do to represent nature of data he is looking at? He would aggregate them. If each of our scientist decide to calculate average of data he is looking at, we would get a table like this one:

In the real database table, with millions of rows, these average values would not be representatives of anything. Our windows are too random. If we can create a rule by which windows are created, then we would have a scientific view of our data. Let's say that each data scientist can only see its own row, and previous two rows. Then we would have a rule. Check out animated image below (left image).

We can also define these rules in SQL if we use Window Functions. Window Functions are special, because they can define windows and then apply some aggregation to data in those windows. We can apply aggregations like SUM, AVG, MAX, but we can also use some special aggregation functions.

Window functions are also called Analytic functions, because they give us abilities that are beyond traditional SQL statements. With them we can do things which were previously hard to achieve in SQL, and they are really useful for a deep analysis of our data. Windows are like overlapping samples from our tables. They can reveal us how the nature of our data is changing through time and dimensions.

## Terminology

In SQL, window, as explained above, is actually called "frame". The term "window" means something else. We will now discern difference between window, partition and a frame.

Frame is group of records that will be aggregated. Frame is presented with the moving red rectangle in the animation bellow.

Tables in animation below, show how many points each country won on some sport competition.

## How to Define Window?

This is not true for window functions. Window is under influence of the query context. Everything that is used inside FROM, WHERE, GROUP BY and HAVING clauses will define our window. Window functions can only do their magic after the final dataset is defined and unchangeable. That also means that Window functions can only be used in SELECT and ORDER BY clauses.

## How to Define Partitions?

Partitions can be defined by the values of one column. All rows that have the same value will be the same partition. On our image, all the rows with letter "A" will create Partition 1. It is also possible to use combination of the values from two or more columns to define partitions. Each unique combination of values will define a partition. On the image below, combination of values A and Q will define Partition 2. We can use expression to calculate values for our column(s). In our example, all the rows, where MOD function returns 1 will belong to Partition 1. Rows that return 0 will belong to Partition 2.

## How to Define Frames?

Frames are moving and so, they are always calculated relative to the current row. Two other reference points are the first and the last row in our partition. Position of the frame is always relative to those reference points.

For definition of a frame, we have to define its start row, and its end row. End row has to be after Star row. Below we can see all the ways how to define start and end row.

Notice that for all of this to make sense, records have to be sorted.

## What is a Group?

For window functions, start and end of a frame doesn't have to be a row. Start and end can also be defined with ranges and groups. Ranges and Groups are not individual records, they are sets of records.

## What is a Range?

Relative positions are important for rows and groups, but not for ranges. With ranges, we are dealing with values in our column.

Let's say that some student took a school test. She scored 85 points on a test and she got a grade "A", because if number of points is between 76-100, then the grade is "A". It is similar with Ranges in window functions. Each frame is defined with a range of values. If a field value belongs to that range, then that record belongs to a frame defined by that range.

So, how we define a range? Really simple. If our current row has a value of X, we will add or subtract some number to that X, and we will get an extreme value of our range.

## Demonstration

This is a simple example of a window function. This example shows how to calculate cumulative of the qty column. We are not using PARTITION BY clause, so the whole table is one big partition.

## 0190 Common Table Expressions (CTE) in MonetDB

Table with billion rows and 100 columns, can be represented with one short name. We can use the same logic and give a name to our SELECT statements. If we want to use those SELECT statements in a subquery, join or union, we can refer to them by their names. This is how we can reduce any complex statement to its building blocks. This will break complex logic to simple steps, where each step is easy to understand and manage.

CTEs are a way to name our SELECT statement, and then to use those names in the final SELECT, INSERT, DELETE , UPDATE or MERGE statement.

## Syntax of Common Table Expression

Bellow we can see statement with two CTEs. We can have them as many as we want ( Name1, Name2, Name3, Name4 … ). Each CTE will give a custom name to some SELECT statement. Thanks to this, final statement (which can be SELECT, INSERT, DELETE, UPDATE, MERGE), will be short and simple.

Not only CTE can break our logic into manageable elements, but it can also reduce repetition. We can write "SELECT * FROM Table2" once and then use it twice in the final statement. CTE will only improve readability of our statement and it will make it more concise. It will not improve performance of a statement.

## Sample Tables

We will create two sample tables to use them in our CTEs.

## ORDER BY

In MonetDB, it is possible to use ORDER BY inside of CTEs definitions. That sorting will propagate to the final result.

## Recursive CTEs

Recursive CTEs are not supported in MonetDB database. This is done deliberately because of performance concerns.

## Introduction

Merge is used when we want to use records from one table to decide what records to modify in another table. Let's assume that we want to keep two tables synchronized. In that case, every change on the first table should be reflected on the second table.

MERGE statement cannot synchronize updates, deletes and inserts at the same time. Merge can synchronize ( INSERT and UPDATE ), or ( INSERT and DELETE ). Image bellow give us an explanation for INSERT and UPDATE. MERGE will create left outer join between tables. It will match rows based on our condition. Merge will then add and update records in the table B, so that table B is the same as the table A.

For INSERT and DELETE, we do the similar thing. New rows will be added to the table B, but matched rows will be deleted from the table B.

## Sample Tables

We will create two tables. We will try to propagate all the changes in the table A to the table B. For start, we will enter only one row in the table A.

## Initial Synchronization

All the rows from the table A that do not exist in table B will be inserted into table B.

Now, table B, has one row, the same as table A.

## Merge with INSERT

Then we will repeat the same MERGE statement as above. This new row will then appear in the table B.

## Merge with UPDATE

First, we will update the table A. We will change A1 to A4.

Then we'll push that change to the table B. Notice that we don't use 'WHEN NOT MATCHED'. Now we use 'WHEN MATCHED'.

## Merge with DELETE

We will update one record in the table A.

Now, we will delete all the rows from the table B, that do exist in the table A. Row A4 exists in both tables, so that row will be deleted.

## Merge with INSERT and UPDATE

The real reason for existence of MERGE statement is because we can do two things in one statement. This time we will do INSERT and UPDATE at the same time.

## Merge with INSERT and DELETE

Currently, tables A and B are equal. We will add a row to table A to show how to use merge with INSERT and DELETE.

## Things That Don't Work in Merge

### Multiple matches

I will add row "E6" in the table "A" with a statement "INSERT INTO A VALUES ( 'E', 6 )". Now this table has rows E5 and E6. If we now apply MERGE statement, both rows E5 and E6 will try to update row E5 in the table B. This is not allowed and will fail.