0490 Grouping Sets and Comments in MonetDB

Sample Table

CREATE TABLE tabSales( Continent VARCHAR(20), Subcontinent  VARCHAR(20), Country VARCHAR(20),
                       State     VARCHAR(30), Sales   INT          );

INSERT INTO tabSales VALUES ('America', 'North',   'Canada', 'Ontario',     1),
                            ('America', 'North',   'Canada', 'Quebec',      2),
                            ('America', 'North',   'US',     'California',  4),
                            ('America', 'North',   'US',     'Texas',       8),
                            ('America', 'Central', 'Mexico', 'Jalisco',     16);

SELECT * FROM tabSales;

The Problem

When we create a Pivot table, from the sample table, we will see all of the detail sales (1,2,4,8,16), but we will also see totals (3,12,16,31).

The question is, what query would return all of these numbers, both detail values and totals, if we use MonetDB.

This is one possible solution:

SELECT Country, State, Sales FROM tabSales
UNION ALL
SELECT Country, null, SUM( Sales ) FROM tabSales GROUP BY Country
UNION ALL
SELECT null, null, SUM( Sales ) FROM tabSales;
On the image, the rows are sorted so
that the table looks like the pivot table.

UNION ALL solution is bad for several reasons:
1) We have three queries to execute and then to combine multiple result sets into one.
2) It is hard to read and modify long UNION ALL query.
3) We have to be careful to properly align columns.

This is the problem that can be solved by grouping sets.

Grouping Sets

"Grouping Sets" are much better and faster syntax to achieve the same goal.  

SELECT Country, State, SUM( Sales )
FROM tabSales
GROUP BY GROUPING SETS ( Country, State, ( ) );  


Empty parentheses are for the grand total.
We'll get the same result, except the
repetition of country names is reduced. Instead of them we have nulls.

Look what we will get if we place parentheses around Country and State.  

SELECT Country, State, SUM( Sales )
FROM tabSales
GROUP BY GROUPING SETS ( ( Country, State ), () );


Parentheses are there to define each group.

We can see the effect of parentheses better on this example.  

SELECT Continent, Subcontinent, SUM( Sales )
FROM tabSales
GROUP BY GROUPING SETS ( ( Continent, Subcontinent ), Continent );

Continent can be used by itself, but it can be also used in conjunction with Subcontinent to define a group. 

It is now clear that each element inside GROUPING SETS is a separate definition of a group. Each group can be defined by one column  > Continent <, or by several columns placed inside of the parentheses ( Continent, Subcontinent ).

These two examples, that would return the same result, show the logic and brevity of the grouping sets.SELECT Col1, Col2, SUM( Sales )
FROM Table
GROUP BY GROUPING SETS
    (  ( Col1, Col2 ), Col1 );
SELECT Col1, Col2, Sales FROM Table
UNION ALL

SELECT Col1, null, SUM( Sales )
   FROM Table
   GROUP BY Col1;

Rollup

SELECT Continent, Subcontinent, Country, SUM( Sales )
FROM tabSales

GROUP BY ROLLUP( Continent, Subcontinent, Country );
ROLLUP( a, b, c ) is the same as grouping sets "( a, b, c ), ( a, b ), ( a ), ()". This is a way to get hierarchy of the columns. Rollup will give us all of the numbers that we need to create a pivot table.
For ROLLUP, the order of the columns is important.

ROLLUP( a, b, c )                ROLLUP( b, a, c )

( a, b, c )                      ( b, a, c )
( a, b )                         ( b, a )
( a )                            ( b )
( )                              ( )

Similar to GROUPING SETS, ROLLUP can also create combinations of columns by using parentheses.
 
SELECT Subcontinent, Country, State, SUM( Sales )
FROM tabSales
GROUP BY ROLLUP( ( Subcontinent, Country ), State );

CUBE

CUBE works similar to ROLLUP, but have a different logic. CUBE will give us all of the possible combinations. CUBE( a, b, c ) will give us 2^3 grouping sets "(a,b,c), (a,b), (a,c), (b,c), (a), (b), (c) and ()".
———————————————————————————-

SELECT Subcontinent, Country, SUM( Sales )
FROM tabSales GROUP BY CUBE( Subcontinent, Country );
Because we have only 2 columns inside of CUBE in our example, number of combinations is 2^2 = 4 "(a,b), (b), (a), ()".

We can also define groups by using parentheses.  
SELECT Subcontinent, Country, SUM( Sales ) FROM tabSales
GROUP BY CUBE( ( Subcontinent, Country ) );

We now have only one element. We have only  two ( 2^1 ) groups "(a), ()".

Addition and Multiplication in Grouping Sets

This is addition:
( a, b )                                   ( a, b )
( b, c )       +       ( d )       =       ( b, c )
                                           ( d )
This is multiplication. Multiplication is like crossjoin.

( a, b )                                   ( a, b ), ( d )
( b, c )       *       ( d )       =       ( b, c ), ( d )

Syntax for addition is like this. Everything inside of the GROUPING SETS parentheses will be added to each other. In this example we will add ( Subcontinent ) + ( Country ) + ( ).  

SELECT Subcontinent, Country, SUM( Sales )
FROM tabSales
GROUP BY GROUPING SETS ( Subcontinent, ROLLUP( Country ) );

So, if we create GROUPING SETS like this, this will be addition.
GROUPING SETS ( Continent, ROLLUP( Continent, Subcontinent ), CUBE( Country, State ), () )

Addition can create duplicates:
SELECT Continent FROM tabSales
GROUP BY GROUPING SETS ( CUBE( Continent ), () );
CUBE will create Grand Total, but we will also
get grand total from the "( )" element.

This is a syntax for multiplication. This time we will have commas between GROUPING SETS, ROLLUPS and CUBES, and individual elements.
GROUPING SETS ( Continent ), ROLLUP( Continent, Subcontinent ), CUBE( Country, State ), (),Country

This example will give us 2 x 2 = 4 rows. ROLLUP will give us America, and "( )". GROUPING SETS will give us "North" and "Central". Then we combine them 2 x 2.  

SELECT Continent, Subcontinent, SUM( Sales ) FROM tabSales
GROUP BY ROLLUP( Continent ), GROUPING SETS ( Subcontinent );
Multiplication can also create duplicates:
SELECT Continent, SUM( Sales ) FROM tabSales
GROUP BY GROUPING SETS ( Continent, () ), GROUPING SETS ( Continent, () );

Indicator Function – GROUPING

GROUPING function will inform us what rows are subtotals / grand total. In such rows, some columns have nulls because they are consolidated. GROUPING function has an argument which is a column, and GROUPING function will return the result only for that column.

SELECT Continent, GROUPING( Continent ) AS ContiGroup, Subcontinent, GROUPING( Subcontinent ) AS SubcontiGroup , SUM( Sales )
FROM tabSales
GROUP BY ROLLUP ( Continent, Subcontinent );

This function is important because it help us to make distinction between subtotal nulls, and missing data nulls.

Formatting with COALESCE and Sort

This is not good looking table. Let's fix it.

SELECT Subcontinent, State, SUM( Sales )
FROM tabSales
GROUP BY ROLLUP( Subcontinent, State );

COALESCE will helps us to eliminate NULLS:

SELECT COALESCE( Subcontinent, 'Grand' ) AS Subcontinent
     , COALESCE( State, 'Total' ) AS State
     , SUM( Sales ) AS Sales FROM tabSales
GROUP BY ROLLUP( Subcontinent, State );

With GROUPING function, we can create columns that will help us to sort the table.

SELECT COALESCE( Subcontinent, 'Grand' ) AS Subcontinent
     , COALESCE( State, 'Total' ) AS State
     , SUM( Sales ) AS Sales
     , GROUPING( Subcontinent ) AS SubcSort
     , GROUPING( State ) AS StateSort 
FROM tabSales GROUP BY ROLLUP( Subcontinent, State )
ORDER BY SubcSort, Subcontinent, StateSort;

These auxiliary columns ( SubcSort and StateSort ) can be easily eliminated by wrapping everything with "SELECT Subcontinent, State, Sales".

Comments

Sample Table and Function

Let's create two tables and function.

CREATE TABLE tabComment( Number INTEGER );
CREATE TEMPORARY TABLE tabTemporary( Number INTEGER );

CREATE OR REPLACE FUNCTION funcComment( Arg1 INTEGER )
RETURNS INTEGER
BEGIN
     RETURN 2;
END;

Comments on Database Objects

We can create comments that are tied for database objects. Comments convey information about that object.  COMMENT ON TABLE tabComment IS 'tabComment description';
COMMENT ON COLUMN tabComment.Number IS 'Number column description';
COMMENT ON FUNCTION funcComment IS 'funcComment description';
COMMENT ON SCHEMA sys IS 'sys schema description';

We will then find IDs of our database objects:
SELECT * FROM sys.tables WHERE name = 'tabcomment';
15876
SELECT * FROM sys.columns WHERE table_id = 15876;
15875
SELECT * FROM sys.functions WHERE name = 'funccomment';
15881
SELECT * FROM sys.schemas WHERE name = 'sys';

2000
All of these IDs can be found in the system table "sys.comments" together with their comments.
SELECT * FROM sys.comments
WHERE Id IN ( 15876, 15875, 15881, 2000 );


Deleting a Comment

If we delete an object, its comment will be deleted.
DROP TABLE tabComment;
SELECT * FROM sys.comments WHERE Id = 15876;

We can delete a comment by setting it to NULL or an empty string.
COMMENT ON SCHEMA sys IS null;
SELECT * FROM sys.comments WHERE Id = 2000;

If a function is overloaded then we have to provide the full signature.
COMMENT ON FUNCTION funcComment( INTEGER ) IS '';
SELECT * FROM sys.comments WHERE Id = 15881;

Persistent Database Objects

There are other database objects that we can place a comment on. They are all persistent database objects.COMMENT ON VIEW view_name IS 'Comment';
COMMENT ON INDEX index_name IS 'Comment';
COMMENT ON SEQUENCE sequence_name IS 'Comment';
COMMENT ON PROCEDURE procedure_name IS 'Comment';

COMMENT ON AGGREGATE aggregate_name IS 'Comment';
COMMENT ON LOADER loader_name IS 'Comment';

We can not create a comment on a temporary object.
COMMENT ON TABLE tabTemporary IS 'tabTemporary description';

Leave a Comment

Your email address will not be published. Required fields are marked *