0220 Window Functions Syntax Read More »
The post 0220 Window Functions Syntax appeared first on bizkapish.
]]>Function(Expressions)  This is the complete syntax of a Window function. MonetDB doesn't support FILTER clause, so we will skip that clause. For EXCLUDE clause, only "EXCLUDE NO OTHERS" is implemented. That means that this clause is currently useless, so we will skip it, too. 
CREATE TABLE aggWindow ( Part CHAR(5), Number Integer );

If the parentheses after OVER keyword are empty, then we will aggregate the whole Number column. Aggregate functions, like SUM, AVERAGE, MAX…, will ignore NULLs. 1+1+3+3+6+6+8+8 = 36 SELECT Part, Number, 
We can partition our table before aggregation. Each partition will be separately aggregated. 1+1+3+3 = 8 6+6+8+8 = 28 SELECT Part, Number, 
ORDER BY is always accompanied with the frame definition. If the frame definition is not provided, then the default frame is used. The default frame is:RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW SELECT Part, Number, We should avoid using default values and we should always provide explicit frame definitions. 
If we don't provide ORDER BY, there is no knowing how frames will be created. It is important to provide ORDER BY to avoid randomness. Notice in the image that we have a meaningless running total, because there is no ordering of the Number column.SELECT Part, Number, SUM( Number ) OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Col1 FROM aggWindow; If we sort Number column in the final data set, meaningless running total will become more obvious. SELECT Part, Number, SUM( Number ) OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Col1 FROM aggWindow ORDER BY Number; 
Notice that ORDER BY clause can exist on two places in the statement. One is used to define frame, and the classic one is used to sort final data set.SELECT Part, Number, SUM( Number )
OVER ( ORDER BY Number ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Col1
FROM aggWindow ORDER BY Number;
Within the Window, the NULL position can be controlled independently of the ORDER BY clause. We can use NULLS LAST or NULLS FIRST to explicitly define position of NULLs.SELECT Part, Number, SUM( Number ) OVER ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Col1 FROM aggWindow ORDER BY Number;  In the image, our Null is appearing in the first row because of the classic ORDER BY clause, but it is paired with the biggest value in the "Col1" for Partition 2 (28). NULL is now part of the last frame in the last Partition. This is because of the clause NULLS LAST, which changed position of the NULL row, from the first, to the last, in the window definition. 
In this example, frame is defined by the current and two previous rows.SELECT Part, Number, FROM aggWindow ORDER BY Number; 
With Ranges, current row comprise all the rows with the same value. Current row is a set of records, and not only one row.SELECT Part, Number, SUM( Number ) OVER ( ORDER BY Number RANGE BETWEEN CURRENT ROW AND 3 FOLLOWING ) AS Col1 FROM aggWindow ORDER BY Number;  This example will use all the rows with the current value X and will calculate range with limits [X,X+3]. For X = 6, limits are [6,6+3]. Numbers 6 and 8 are both inside of this range. 
Each frame encompass current group, one previous group, and all the latter groups.SELECT Part, Number, 
Window functions are verbose. If we want to use them several times in our statement, then our statement will become really long.
SELECT Part, Number, SUM( Number ) OVER ( PARTITION BY Part ORDER BY Number ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) AS Col1, SUM( Number ) OVER ( ORDER BY Number DESC GROUPS BETWEEN 1 PRECEDING AND CURRENT ROW ) AS Col2 FROM aggWindow ORDER BY Number; 
The only workaround is available if we are using the same window for several functions. We can then define our window once, with WINDOW clause, and then reference it several times.SELECT Part, Number, SUM( Number ) OVER W AS SumCol, MAX( Number ) OVER W AS MaxCol FROM aggWindow WINDOW W AS ( PARTITION BY Part ORDER BY Number ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) ORDER BY Number; 
We already saw that we should avoid using default frames. There are two more abbreviations that will assume default frames.
Abbreviated syntax  Full definition of the default frame 
{ UNBOUNDED  X } PRECEDING  BETWEEN { UNBOUNDED  X } PRECEDING AND CURRENT ROW 
{ UNBOUNDED  X } FOLLOWING  BETWEEN CURRENT ROW AND { UNBOUNDED  X } FOLLOWING 
SELECT Part, Number, FROM aggWindow ORDER BY Number; –The same is for RANGE or GROUPS.  Default frame is:SELECT Part, Number, –The same is for RANGE or GROUPS. 
We can group our table with GROUP BY. After we do this, window function will only see those grouped rows. On the image to the left, window function will only see 2 rows. Detail rows will not be any more available to window function. The question is, what syntax to use to create running total in this grouped table. 
Bellow we can see correct syntax. Columns in the grouped table are referred as Part and COUNT( Number ). Our Window function is based on those columns. That means that our window function will be SUM( COUNT( Number ) ).
SELECT FROM aggWindow 
Range frames can only use one column inside of ORDER BY clause. SELECT Part, Number, SUM( Number ) OVER ( ORDER BY Part DESC, Number RANGE BETWEEN CURRENT ROW AND 3 FOLLOWING ) AS Col1 FROM aggWindow ORDER BY Number; 
We can not use DISTINCT keyword inside of Window function. SELECT Part, Number ; 
The post 0220 Window Functions Syntax appeared first on bizkapish.
]]>0210 Aggregate Functions and Logical Functions Read More »
The post 0210 Aggregate Functions and Logical Functions appeared first on bizkapish.
]]>CREATE TABLE aggtable( Number INT, Word VARCHAR(8), intervalMonth INTERVAL MONTH ); VALUES ( 2, 'two', INTERVAL '2' month ), ( 2, 'two', INTERVAL '2' month ) , ( 3, 'three', INTERVAL '3' month ), ( 4,'four', INTERVAL '6' month ) , ( NULL, NULL, NULL ); 
We can get a list of aggregate functions from the system table sys.functions. Aggregate functions are of the type 3.
SELECT DISTINCT name, mod FROM sys.functions WHERE type = 3 ORDER BY name;
We can divide aggregate functions into three groups:
Arithmetic functions  Concatenation functions  Statistic functions 
avg, count, count_no_nil, max, min, prod, sum  group_concat, listagg, tojsonarray  corr, covar_pop, covar_samp, median, median_avg, quantile, quantile_avg, stdev_pop, stdev_samp, var_pop, var_samp 
SQL  Result  Calculation  Comment 
SELECT AVG( Number ) FROM aggTable;  2.75  (2+2+3+4)/4 = 2.75  NULL is ignored. 
SELECT COUNT( * ) FROM aggTable;  5  Count the rows of the table.  
SELECT COUNT( Word ) FROM aggTable;  4  Count words, without NULL.  
SELECT COUNT_NO_NIL( Word ) FROM aggTable;  Error  Doesn't work.  
SELECT MAX( Word ) FROM aggTable;  'two'  Last value.  Words are ordered alphabetically, AZ. 
SELECT MIN( Number ) FROM aggTable;  2  First value.  Numbers are ordered numerically. 
SELECT PROD( Number ) FROM aggTable;  48  2*2*3*4=48  
SELECT SUM( Number ) FROM aggTable;  11  2+2+3+4=11  
SELECT SUM( intervalMonth ) FROM aggTable;  13  2+2+3+6=13  Also work with seconds. Result is of interval type. 
We can use DISTINCT keyword to exclude duplicates. In our sample table, only number 3 is duplicate. Calculations below will be done like that duplicate value doesn't exist 
SQL  Result  Calculation 
SELECT AVG( DISTINCT Number ) FROM aggTable;  3  (2+ 3 +4)/3 = 3 
SELECT COUNT( DISTINCT Word ) FROM aggTable;  3  Count word, no NULLs, no duplicates. 
SELECT PROD( DISTINCT Number ) FROM aggTable;  24  2* 3 *4 = 24 
SELECT SUM( DISTINCT Number ) FROM aggTable;  9  2+ 3 +4 = 9 
SELECT SUM( DISTINCT intervalMonth ) FROM aggTable;  9  2+ 3 +4 = 9 
Concatenation is a way to aggregate text. Instead of having text occupying N rows, where on each row we have only one phrase, we can aggregate them all into one row. Result will be comma separated list of those phrases, although we can choose what delimiter will be used instead of the comma. If we want to remove duplicates, we have to use DISTINCT keyword before the name of a column. It is not possible to control sort order of the phrases in the result. 
SQL  Result  Comment 
SELECT LISTAGG( Word, '' ) FROM aggTable;  onetwothreethreefour  Default delimiter is a comma. Returns VARCHAR. 
SELECT SYS.GROUP_CONCAT( Word, ';' ) FROM aggTable;  one;two;three;three;four  Default delimiter is a comma. Returns CLOB. 
SELECT JSON.ToJsonArray( Word ) FROM aggTable;  [ "one", "two", "three", "three", "four" ]  Result is JSON list. 
SELECT JSON.ToJsonArray( Number ) FROM aggTable;  [ "1", "2", "3", "3", "4" ]  Works with numbers, too. 
Numbers, from Number column, can be divided into smaller and larger numbers. Half of the numbers will be smaller and the other half will be larger numbers. The number on the border between the smaller and larger numbers is the median.
SQL  Result  Result if we add number 5 to our column. 
SELECT SYS.MEDIAN( Number ) FROM aggTable;  2_{1},2_{(}_{2)},3_{3},4_{4}_{ }=> 2  2_{1},2_{2},3_{(}_{3)},4_{4},5_{5 }_{ }=> 3 
SELECT SYS.MEDIAN_AVG( Number ) FROM aggTable;  2_{1},2_{(}_{2)},3_{(}_{3)},4_{4}_{ } => (2+3)/2 = 2.5  2_{1},2_{2},3_{(}_{3)},4_{4},5_{5}_{ }=> 3 
Median is a special case of a quantile. Median is a 50% quantile. But we can differently divide our numbers. We can divide them 60% vs 40%, so that 60% numbers are on the smaller side, and 40% is on the bigger side. Number between them would be called 60% quantile. In our example below, "60 % quantile" is 2.8, which means that 60% of numbers is below 2.8. This would be more obvious if we had more numbers in our column.
SQL  Result  Calculation 
SELECT SYS.QUANTILE_AVG( Number, 0.6 ) FROM aggTable;  2.8  ( 4 – 1 ) * 0.6 + 1, where 4 is the count of our numbers. 
SELECT SYS.QUANTILE( Number, 0.6 ) FROM aggTable;  3  This is just value from above, rounded to integer. 
Variance and standard deviation are calculated differently depending whether our data represent a population or a sample.
SQL  Result  Calculation 
SELECT SYS.VAR_SAMP( Number ) FROM aggTable;  0.917  
SELECT SYS.StdDev_SAMP( Number ) FROM aggTable;  0.957  sqrt( variance ) = sqrt( 0.917 ) = 0.957 
SELECT SYS.VAR_POP( Number ) FROM aggTable;  0.687  
SELECT SYS.StdDev_POP( Number ) FROM aggTable;  0.829  sqrt( variance ) = sqrt( 0.687 ) = 0.829 
Covariance in statistics measures the extent to which two variables vary linearly. Correlation is just covariance measured in normalized units. Unfortunately, there is a bug in MonetDB, version 11.49.09, and all of these functions will return wrong results.
SQL  Result in MonetDB  Calculation 
SELECT SYS.COVAR_SAMP( Number, 10 – Number * 1.2 ) FROM aggTable;  1.417 ( not correct, it is 1.1 )  
SELECT SYS.COVAR_POP( Number, 10 – Number * 1.2 ) FROM aggTable;  1.0625 ( not correct, it is 0.825 )  
SELECT SYS.CORR( Number, 10 – Number * 1.2 ) FROM aggTable;  0.986 ( not correct, it is 1 ) 
All statistic function will ignore NULLs.
SQL  Result  Comment 
SELECT True AND True;  TRUE  Returns TRUE only if both arguments are TRUE. 
SELECT True OR False;  TRUE  Returns TRUE if at least one argument is TRUE. 
SQL  Result  Comment 
SELECT NOT True;  FALSE  Will transform TRUE to FALSE, and FALSE to TRUE. Always the opposite. 
SELECT Null IS NULL;  TRUE  Checks whether something is NULL. 
SELECT Null IS NOT NULL;  FALSE  Checks whether something is NOT NULL. 
All other logical operators will return Null if at least one of its arguments is Null.
SQL  Result 
SELECT NOT Null;  NULL 
SELECT Null AND True;  NULL 
Most SQL functions will either return NULL if one of the arguments is NULL, or will ignore rows with NULL values.
Operators AND, OR and NOT have alternative syntax where they work like functions. XOR can not work like operator, only like a function.
SQL  Result  Comment 
SELECT "xor"(True, False);  TRUE  Returns TRUE only when the first argument is the opposite of the second argument (Arg1 = NOT Arg2). 
SELECT "and"(True, False);  FALSE  Returns TRUE only if both arguments are TRUE. 
SELECT "or"(False, False);  FALSE  Returns TRUE if at least one argument is TRUE. 
SELECT "not"(False);  TRUE  Will transform TRUE to FALSE, and FALSE to TRUE. Always the opposite. 
The post 0210 Aggregate Functions and Logical Functions appeared first on bizkapish.
]]>0200 MonetDB: Window Functions Theory Read More »
The post 0200 MonetDB: Window Functions Theory appeared first on bizkapish.
]]>Now, imagine that in each row, of a database table, there is a data scientist looking through binoculars. Each data scientist can only see some of the rows from that table. Each data scientist has its own window.
What would scientist do to represent nature of data he is looking at? He would aggregate them. If each of our scientist decide to calculate average of data he is looking at, we would get a table like this one:
In the real database table, with millions of rows, these average values would not be representatives of anything. Our windows are too random. If we can create a rule by which windows are created, then we would have a scientific view of our data. Let's say that each data scientist can only see its own row, and previous two rows. Then we would have a rule. Check out animated image below (left image).
By using this rule for a window creation, we can calculate "moving averages", which are often used in statistics. 
We can also define these rules in SQL if we use Window Functions. Window Functions are special, because they can define windows and then apply some aggregation to data in those windows. We can apply aggregations like SUM, AVG, MAX, but we can also use some special aggregation functions.
Window functions are also called Analytic functions, because they give us abilities that are beyond traditional SQL statements. With them we can do things which were previously hard to achieve in SQL, and they are really useful for a deep analysis of our data. Windows are like overlapping samples from our tables. They can reveal us how the nature of our data is changing through time and dimensions.
In SQL, window, as explained above, is actually called "frame". The term "window" means something else. We will now discern difference between window, partition and a frame.
Frame is group of records that will be aggregated. Frame is presented with the moving red rectangle in the animation bellow.
Tables in animation below, show how many points each country won on some sport competition.
Sport results presented on the animation above will not be held in a database like three tables, but they will be placed together into one big table. That big table is our Window (assuming we are using no filters on that table). Smaller tables are called Partitions. Red moving rectangles are Frames. Window functions can process partitions separately, the same as they were separate tables. 
SELECT employee_id, salary FROM employees WHERE department_id = 101 AND salary > (SELECT AVG(salary) FROM employees);  Subquery is not under direct influence of the outer query. In the example, we have a filter department_id = 101 on the outer query. But subquery is not under the influence of that filter. Subquery will calculate average salary for all of the employees. Meaning is, that we are looking for employees from department 101, that have bigger salary then the global average. 
This is not true for window functions. Window is under influence of the query context. Everything that is used inside FROM, WHERE, GROUP BY and HAVING clauses will define our window. Window functions can only do their magic after the final dataset is defined and unchangeable. That also means that Window functions can only be used in SELECT and ORDER BY clauses.
Partitions can be defined by the values of one column. All rows that have the same value will be the same partition. On our image, all the rows with letter "A" will create Partition 1. It is also possible to use combination of the values from two or more columns to define partitions. Each unique combination of values will define a partition. On the image below, combination of values A and Q will define Partition 2. We can use expression to calculate values for our column(s). In our example, all the rows, where MOD function returns 1 will belong to Partition 1. Rows that return 0 will belong to Partition 2.
Frames are moving and so, they are always calculated relative to the current row. Two other reference points are the first and the last row in our partition. Position of the frame is always relative to those reference points.
For definition of a frame, we have to define its start row, and its end row. End row has to be after Star row. Below we can see all the ways how to define start and end row.
Can only be START ROW – UNBOUNDED PRECEDING – the first row in the partition Can be both START or END ROW – N PRECEDING – row that is N rows before current row. – CURRENT ROW – our major reference point. – M FOLLOWING – row that is M rows after current row. Can only be END ROW – UNBOUNDED FOLLOWING – the last row in the partition  An example: 
Notice that for all of this to make sense, records have to be sorted.
For window functions, start and end of a frame doesn't have to be a row. Start and end can also be defined with ranges and groups. Ranges and Groups are not individual records, they are sets of records.
Groups are defined similar to Partitions. All rows with the same value will be one group. On the image to the left, current row is the row 5, but the current group is the Group 3. We are no more looking at 9 records, we are looking at 5 groups. Our frame will start with one of the groups and will end with one of the groups after.  In this example, our frame will start with the first group in Partition, and will end on the group that is just after the current group. 
Relative positions are important for rows and groups, but not for ranges. With ranges, we are dealing with values in our column.
Let's say that some student took a school test. She scored 85 points on a test and she got a grade "A", because if number of points is between 76100, then the grade is "A". It is similar with Ranges in window functions. Each frame is defined with a range of values. If a field value belongs to that range, then that record belongs to a frame defined by that range.
So, how we define a range? Really simple. If our current row has a value of X, we will add or subtract some number to that X, and we will get an extreme value of our range.
current value – N (syntax: N preceding) current value + M (syntax: M following)  If, adjacent rows of the current row, have values that are close enough to the current value, then those rows will together make a frame. Our frame is between [20,30], so all the rows beside the first and the last one, belong to this frame. 
This is a simple example of a window function. This example shows how to calculate cumulative of the qty column. We are not using PARTITION BY clause, so the whole table is one big partition.
The post 0200 MonetDB: Window Functions Theory appeared first on bizkapish.
]]>0190 Common Table Expressions (CTE) in MonetDB Read More »
The post 0190 Common Table Expressions (CTE) in MonetDB appeared first on bizkapish.
]]>CTEs are a way to name our SELECT statement, and then to use those names in the final SELECT, INSERT, DELETE , UPDATE or MERGE statement.
Bellow we can see statement with two CTEs. We can have them as many as we want ( Name1, Name2, Name3, Name4 … ). Each CTE will give a custom name to some SELECT statement. Thanks to this, final statement (which can be SELECT, INSERT, DELETE, UPDATE, MERGE), will be short and simple.
WITH JOIN Name2 ON someCondition JOIN Name2 ON someOtherCondition; 
Not only CTE can break our logic into manageable elements, but it can also reduce repetition. We can write "SELECT * FROM Table2" once and then use it twice in the final statement. CTE will only improve readability of our statement and it will make it more concise. It will not improve performance of a statement.
We will create two sample tables to use them in our CTEs.
CREATE TABLE Cte1 ( Letter CHAR, Number INTEGER ); 
WITH Name1 AS ( SELECT * FROM Cte1 ), Name2 AS ( SELECT * FROM Cte2 ) 
 Because two tables don't have equal rows, EXCEPT will return all the rows from the Cte1 table.  We will use CTE to insert those rows into Cte2. 
WITH  When we use DELETE, we always have to delete from the table, "DELETE FROM Cte2 ". It is not possible to create a CTE, and then to delete from that CTE, like "DELETE FROM ", expecting that server will delete from the underlining table. That means that we only can use CTE in the WHERE clause. After deleting A1 and B2 from Cte2, our Cte2 table is back to its original state. 
WITH  We will update all the letters in the table Cte2 where numbers are common for the both tables. Because both tables have numbers 1 and 2, that means that all the letters in the table Cte2 will be updated to "X". Again, CTE can only be used in the WHERE clause. 
WITH MERGE INTO Cte1  In MERGE statement, we can use CTE in the USING clause. Values from that CTE can be used to change values in the database table. Our tables, Cte1 and Cte2 don't have common letters, so we don't have matches. That means that all the rows from our Cte2 table will be inserted into Cte1. 
We can use CTE without providing aliases to its columns.WITH 
If we want to, we can provide aliases in the SELECT statement. It is also possible to provide aliases after the name of the CTE. If both are provided, then the outside aliases will prevail ( Column1 and Column2, and not colLetter and colNumber ). WITH Name1 ( Column1, Column2 ) AS ( SELECT Letter AS colLetter, Number AS colNumber FROM Cte1 ) SELECT * FROM Name1; 
WITH  It is possible to reference one CTE from another CTE (Name2 is calling Name1 ). 
WITH  With nesting, we must be sure that referenced CTE is already defined. If we reference some of the latter CTEs, then those CTEs will not be recognize ( Name1 cannot call Name2, because it is not already defined ). 
In MonetDB, it is possible to use ORDER BY inside of CTEs definitions. That sorting will propagate to the final result.
WITH 
Recursive CTEs are not supported in MonetDB database. This is done deliberately because of performance concerns.
The post 0190 Common Table Expressions (CTE) in MonetDB appeared first on bizkapish.
]]>0180 SQL Merge in MonetDB Read More »
The post 0180 SQL Merge in MonetDB appeared first on bizkapish.
]]>Merge is used when we want to use records from one table to decide what records to modify in another table. Let's assume that we want to keep two tables synchronized. In that case, every change on the first table should be reflected on the second table.
For full reflection, all of the updates, deletes or inserts that are done on the A table should also be done on the B table. For partial reflection, we can do just the part of the full reflection. For example, new rows from the A table will also be added to the B table. Records updated or deleted in the table A, will not be updated or deleted in the table B. 
MERGE statement cannot synchronize updates, deletes and inserts at the same time. Merge can synchronize ( INSERT and UPDATE ), or ( INSERT and DELETE ). Image bellow give us an explanation for INSERT and UPDATE. MERGE will create left outer join between tables. It will match rows based on our condition. Merge will then add and update records in the table B, so that table B is the same as the table A.
For INSERT and DELETE, we do the similar thing. New rows will be added to the table B, but matched rows will be deleted from the table B.
We will create two tables. We will try to propagate all the changes in the table A to the table B. For start, we will enter only one row in the table A.
CREATE TABLE A ( Letter CHAR, Number INTEGER ); 
All the rows from the table A that do not exist in table B will be inserted into table B.
MERGE INTO B  MERGE INTO targetTable USING sourceTable ON matchingCondition WHEN NOT MATCHED THEN INSERT RECORDS 
Now, table B, has one row, the same as table A.
First, we will insert another row into the table A. INSERT INTO A ( Letter, Number ) VALUES ( 'B', 2 ); 
Then we will repeat the same MERGE statement as above. This new row will then appear in the table B.
MERGE INTO B 
First, we will update the table A. We will change A1 to A4.
UPDATE A SET Number = 4 WHERE Letter = 'A'; 
Then we'll push that change to the table B. Notice that we don't use 'WHEN NOT MATCHED'. Now we use 'WHEN MATCHED'.
MERGE INTO B ON B.Letter = A.Letter and A.Letter = 'A' 
We will update one record in the table A.
UPDATE A SET Number = 8 WHERE Letter = 'B'; 
Now, we will delete all the rows from the table B, that do exist in the table A. Row A4 exists in both tables, so that row will be deleted.
MERGE INTO B 
The real reason for existence of MERGE statement is because we can do two things in one statement. This time we will do INSERT and UPDATE at the same time.
MERGE INTO B USING A VALUES ( A.Letter, A.Number ) 
Currently, tables A and B are equal. We will add a row to table A to show how to use merge with INSERT and DELETE.
INSERT INTO A ( Letter, Number ) VALUES ( 'E', 5 ); 
MERGE INTO B 
Subqueries are not supported in INSERT clause, inside of Merge statement. MERGE INTO B USING A ON B.Letter = A.Letter WHEN NOT MATCHED THEN INSERT SELECT * FROM A; 
We can only use one MATCHED clause and/or one NOT MATCHED clause. MERGE INTO B 
I will add row "E6" in the table "A" with a statement "INSERT INTO A VALUES ( 'E', 6 )". Now this table has rows E5 and E6. If we now apply MERGE statement, both rows E5 and E6 will try to update row E5 in the table B. This is not allowed and will fail.
MERGE INTO B USING A ON B.Letter = A.Letter 
The post 0180 SQL Merge in MonetDB appeared first on bizkapish.
]]>