Table variables are temporary in-memory structures. They exist only during the execution of a function. They are mostly used to build up a result set that will be returned from the function. They can be also used as intermediate result storage for procedural logic.
CREATE FUNCTION funcTableDeclaration() RETURNS TABLE ( Id INTEGER ) BEGIN DECLARE TABLE TableDeclaration ( Id INTEGER ); INSERT INTO TableDeclaration VALUES ( 1 ); RETURN SELECT * FROM TableDeclaration; END;
SELECT * FROM funcTableDeclaration();
Tables created with DECLARE TABLE statement are mutable and can be modified by using any of the DML statements like INSERT INTO, UPDATE, DELETE…
It is also possible to create a table variable based on an existing table, but without inheriting the data. I will create a sample table:
CREATE TABLE sys.tabNum ( Number INTEGER ); INSERT INTO tabNum ( Number ) VALUES ( 32 ); SELECT * FROM tabNum;
CREATE OR REPLACE FUNCTION funcTableDeclaration() RETURNS TABLE( N INTEGER ) BEGIN DECLARE TABLE TableDeclaration AS ( SELECT * FROM tabNum )WITH DATA; RETURN SELECT * FROM TableDeclaration; END;
We have created table variable based on the sample table.
SELECT * FROM funcTableDeclaration(); No data inherited.
Limitations of the Table Variables
CREATE OR REPLACE FUNCTION funcTableDeclaration() RETURNS TABLE ( Id INTEGER ) BEGIN DECLARE TABLE TableDeclaration ( Id INTEGER PRIMARY KEY ); INSERT INTO TableDeclaration VALUES ( 1 ); RETURN SELECT * FROM TableDeclaration; END;
We can not have constraints on the table variables.
CREATE OR REPLACE FUNCTION funcTableDeclaration() RETURNS TABLE( Id INTEGER ) BEGIN DECLARE TABLE TableDeclaration ( Id INTEGER ); INSERT INTO TableDeclaration VALUES ( 1 ); ALTER TABLE TableDeclaration RENAME COLUMN Id to Code; RETURN SELECT * FROM TableDeclaration; END;
It is not possible to apply ALTER TABLE on the table variable.
Declare Scaler Variable
CREATE FUNCTION funcVariableDeclaration() RETURNS INTEGER BEGIN DECLARE Var1 INTEGER; SET Var1 = 77; SET Var1 = 88; RETURN SELECT Var1; END;
It is possible to declare and set scaler variable. Scaler variables are mutable, and we can change their values several times.
SELECT funcVariableDeclaration();
CREATE OR REPLACE FUNCTION funcVariableDeclaration() RETURNS TABLE ( Var1 INTEGER, VAR2 CHAR ) BEGIN DECLARE Var1 INTEGER, Var2 CHAR; SET Var1 = 77; SET Var2 = 'A'; RETURN SELECT Var1, Var2; END;
We can declare two variables in one statement.
SELECT * FROM funcVariableDeclaration();
CREATE OR REPLACE FUNCTION funcVariableDeclaration() RETURNS TABLE ( Var1 INTEGER, VAR2 CHAR ) BEGIN DECLARE Var1 INTEGER, Var2 CHAR; SELECT 1, 'A' INTO Var1, Var2; RETURN SELECT Var1, Var2; END;
Beside declaring two variables at once, we can also set the values of the both variables by the "SELECT INTO" statement.
Expressions for Values
CREATE OR REPLACE FUNCTION funcVariableDeclaration() RETURNS INTEGER BEGIN DECLARE Var1 INTEGER; SET Var1 = 32 + 5; RETURN SELECT Var1; END;
Result of an expression can be assigned as a variable value.
SELECT funcVariableDeclaration();
CREATE OR REPLACE FUNCTION funcVariableDeclaration() RETURNS INTEGER BEGIN DECLARE Var1 INTEGER; SET Var1 = ( SELECT * FROM tabNum ); DECLARE Var2 INTEGER; SET Var2 = Var1 + 5; RETURN SELECT Var2; END;
Result of a query can be assigned to a variable.
SELECT funcVariableDeclaration();
CREATE OR REPLACE FUNCTION funcVariableDeclaration() RETURNS INTEGER BEGIN DECLARE Var1 INTEGER; SET Var1 = ( SELECT * FROM tabNum ) + 5; RETURN SELECT Var1; END;
We can also use this shorter syntax.
CREATE OR REPLACE FUNCTION funcVariableDeclaration() RETURNS INTEGER BEGIN DECLARE Var1 INTEGER, Var2 INTEGER; SET ( Var1, Var2 ) = ( SELECT Number, Number FROM tabNum ); RETURN Var1 / Var2; END;
If the query is returning one row with several columns, we can assign all of those values to several variables in one statement.
Usage of the Scaler Variables
Scaler variables are, also, temporary in-memory structures. We can use them anywhere where we can use constants:
CREATE OR REPLACE FUNCTION funcVariableDeclaration() RETURNS INTEGER BEGIN DECLARE Var1 INTEGER; SET Var1 = 32; RETURN SELECT * FROM tabNum WHERE Number = Var1; END;
SELECT funcVariableDeclaration();
But we can not use scaler variable to give a name to a column of a table variable. We can not use variables instead of the identifiers:
CREATE OR REPLACE FUNCTION funcTableDeclaration() RETURNS TABLE ( Id INTEGER ) BEGIN DECLARE Var1 CHAR; SET Var1 = 'N'; DECLARE TABLE TableDeclaration ( Var1 INTEGER ); RETURN SELECT N FROM TableDeclaration; END;
SELECT * FROM funcTableDeclaration();
CREATE OR REPLACE FUNCTION funcVariableDeclaration() RETURNS INTEGER BEGIN DECLARE Var1 INTEGER; RETURN SELECT Var1; END;
Variable without assigned value will be NULL.
SELECT funcVariableDeclaration();
Control Flow With CASE
In a police lineup, a witness goes from left to right, scanning each person. When the witness recognizes the culprit, he stops scanning.
This is how the CASE control flow structure works. We examine the values one by one until we find a value that matches the description.
Search CASE
CREATE OR REPLACE FUNCTION funcCASE( argNumber INTEGER ) RETURNS CHAR BEGIN DECLARE Res CHAR; CASE WHEN argNumber = 1 THEN SET Res = 'A'; WHEN argNumber = 2 THEN SET Res = 'B'; ELSE SET Res = 'C'; END CASE; RETURN Res; END;
CASE statement is made of conditions checks ( WHEN subclause ) and the results ( THEN subclause ) when the condition is met.
The results must be complete statements. In this case, it is a SET statement. If none of the conditions is met, we will return the statement after the "ELSE" keyword.
SELECT funcCASE( 1 );
SELECT funcCASE( 2 );
SELECT funcCASE( 99 );
CREATE OR REPLACE FUNCTION funcCASE( argNumber INTEGER ) RETURNS CHAR BEGIN DECLARE Res CHAR; CASE WHEN argNumber < 3 THEN SET Res = 'A'; WHEN argNumber < 5 THEN SET Res = 'B'; ELSE SET Res = 'C'; END CASE; RETURN Res; END;
If the conditions overlap, the condition closer to the top of the list will win. For example, if argNumber is "2", then both the conditions "2 < 3" and "2 < 5" will be true. The final result will be the letter "A", because it is related to the top condition in the list.
SELECT funcCASE( 2 );
Simple CASE
CREATE OR REPLACE FUNCTION funcCASE( argNumber INTEGER ) RETURNS CHAR BEGIN DECLARE Res CHAR; CASE argNumber WHEN 1 THEN SET Res = 'A'; WHEN 2 THEN SET Res = 'B'; ELSE SET Res = 'D'; END CASE; RETURN Res; END;
This here is a shorthand for writing the conditions "ArgNumber = 1" and "ArgNumber = 2". This syntax is only useful when we are sure that we will only use equality-based conditions.
SELECT funcCASE( 2 );
On the other hand, when we use "Search CASE" then we can use whatever condition we want.
Nested CASE and Default Result of the CASE statement
CREATE OR REPLACE FUNCTION funcCASE( argNumber INTEGER ) RETURNS CHAR BEGIN DECLARE Res CHAR; CASE WHEN argNumber = 1 THEN SET Res = 'A'; CASE WHEN argNumber = 2 THEN SET Res = 'B'; END CASE; END CASE; RETURN Res; END;
We can nest CASE structures, one inside of another.
Notice that this time we don't have ELSE substatement. In that case every value of the argument, except 1 and 2, will cause the function to return NULL.
Procedural programming languages are based on a series of commands that are executed step by step. Programmer is responsible to explain, not only what should be done, but also how it should be done. Procedural programming is using, variables to store interim results, control flow constructs (IF, CASE, LOOP, WHILE…) to change the value of variables, error handling to catch errors, and everything is organized into blocks of code.
This is the opposite from SQL, which is abstract and declarative language. SQL is great for data querying, filtering, joining, grouping, but it can not use loops, variables and conditions.
Major blocks of code, in procedural languages, are functions and procedures. We can hide complex logic behind each function or procedure and then call that complex logic with simple calls. Functions and procedures can pass arguments between them. All of this allow for modular structure, which provides reusability and better organization of a code.
These blocks of code can be parameterized which gives them much greater reusability than SQL.
Procedural SQL
Procedural SQL is a procedural programming language used in databases. It is closely related to SQL. The purpose of procedural SQL is to extend SQL with procedural capabilities. Now we can place the application logic in the database, where it will be close to the data.
SQL is standardized and highly portable between databases. This allows programmers to learn it once, and use it everywhere. On the other hand, procedural SQL, on different servers, varies in scope and detail. The same procedural SQL commands may look the same but behave differently. This is why we have many dialects of procedural SQL, such as PL/SQL in Oracle or T-SQL in SQL server.
Procedural SQL is used when:
We want to perform complex transformations.
We need logic flow control.
We need more flexibility, so our logic must be parameterized.
We are performing validation and error checking.
We want to embed application logic into database to avoid expensive network round trips.
SQL is better than Procedural SQL because:
It is portable and standardized.
It is faster. It uses less memory and computational power.
It is less complex; user doesn't have to be closely familiar with the data in database and algorithms.
It is easier to read. For procedural loops we can use many lines, while SQL can do the same in one simple statement.
The reasons, why SQL is faster than Procedural SQL are:
SQL gives the optimizer more information on which to base optimization.
SQL needs less logging for rollbacks and transactions.
Less locks are taken, when we use SQL.
Set based logic is focus of RDBMS, so they are heavily optimized for it.
User Defined Functions (UDF) in MonetDB
MonetDB has a lot of built-in functions. We can solve most of the problems using string, mathematical, date time functions. For other problems we can create our functions using procedural SQL.
CREATE OR REPLACE FUNCTION funcReturnTwo() RETURNS INTEGER BEGIN RETURN 2; END;
On the left side, we can see an example of the simple UDF function in MonetDB. This function will always return number 2. We can use this function in SELECT statement: SELECT funcReturnTwo();
Our function has name, returned data type and returned value. Parentheses are for the arguments of a function.
We can delete function with the statement: DROP FUNCTION funcReturnTwo();
We can use SELECT statement to acquire a value that will function return:
CREATE OR REPLACE FUNCTION funcReturnTwo() RETURNS INTEGER BEGIN RETURN SELECT 2; END;
SELECT funcReturnTwo();
But, SELECT statement must return scalar in order to be used in the function. This SELECT statement will return a column:
SELECT * FROM (VALUES (2), (2)) AS t( Col );
CREATE OR REPLACE FUNCTION funcReturnTwo() RETURNS INTEGER BEGIN RETURN SELECT * FROM (VALUES (2), (2)) AS t( Col ); END;
MonetDB will accept this SELECT statement in the UDF function, but that function will not be usable: SELECT funcReturnTwo();
Sample Table and Usage of Arguments
CREATE TABLE NumText ( Number INTEGER, Text VARCHAR(50) ); INSERT INTO NumText VALUES ( 120, 'Mississippi' ); SELECT * FROM NumText;
This time, our function will accept an argument of INTEGER data type. CREATE OR REPLACE FUNCTION funcAddTwo( argNumber INTEGER ) RETURNS INTEGER BEGIN RETURN argNumber + 2; END;
SELECT funcAddTwo( Number ) FROM NumText; '120 + 2 = 122
Data Manipulation Language (DML) and Data Definition Language (DDL) Statements in UDF
DML statements, like SELECT, INSERT, UPDATE, are working normally inside of the UDF functions.
CREATE OR REPLACE FUNCTION funcUpdate( ) RETURNS INTEGER BEGIN UPDATE NumText SET Number = 130 WHERE Number = 120; RETURN SELECT Number FROM NumText; END;
<= We will first update a table, and then we will read from that table. SELECT funcUpdate( );
DDL statements are statements that define and manage the structure of the database ( CREATE, ALTER, DROP… ). If the purpose of the function is to return a value that can be used in a query, is it possible to use DDL statements in the function? Let's try that.
CREATE OR REPLACE FUNCTION funcCreate() RETURNS INTEGER BEGIN CREATE TABLETest ( Number INTEGER ); INSERT INTO Test ( Number ) VALUES ( 7 ); RETURN SELECT Number FROM Test; END;
Our function created a table and then read from that table:
SELECT funcCreate();
We can see that CREATE statement is working correctly inside of a function. We will now read from the tables "NumText" and "Test":
SELECT * FROM NumText; '120 permanently changed to 130
SELECT * FROM Test; There is no table "Test".
We can see that there is no table "Test", although function "funcCreate" works normally. This means that not everything will work inside of the function as we expect to. Because function can be called zillion times, it is smart to avoid executing DDL statements inside of the UDF function.
I will try to rename column in the NumText table, inside of a function. CREATE OR REPLACE FUNCTION funcAlter() RETURNS INTEGER BEGIN ALTER TABLE NumText RENAME COLUMN Text to Tekst; RETURN SELECT Number FROM NumText; END;
This will fail. So, some DDL statements will not work at all.
We can conclude that the main purpose of the UDF function is to calculate and return values. It is not their purpose to significantly change state of a database. That is why, DDL statement will not work inside of the functions, or will have a limited effect.
Environment Variables
The question is, whether it is possible to change environment variables, like CURRENT_SCHEMA, inside of the function? Will that change be permanent?
CREATE OR REPLACE FUNCTION funcSchema() RETURNS VARCHAR(10) BEGIN SET SCHEMA voc; RETURN CURRENT_SCHEMA; END;
SELECT funcSchema();
SELECT CURRENT_SCHEMA;
We can see above, that we can permanently change environment variables inside of the functions.
UDF Can Return Table
UDF functions can return tables, not just scalers. Tables returned by UDF can be used as subqueries.
CREATE OR REPLACE FUNCTION funcTable() RETURNS TABLE ( N INTEGER, T VARCHAR(50) ) BEGIN RETURN SELECT * FROM NumText; END;
SELECT * FROM funcTable();
We can combine returned table, with normal tables.
SELECT * FROM funcTable() UNION ALL SELECT * FROM NumText;
Temporary Tables
We will create two temporary tables, and then we will check, can we used those tables in the UDF function? CREATE LOCAL TEMPORARY TABLE locTable ( Number INTEGER ) ON COMMIT PRESERVE ROWS; INSERT INTO locTable ( Number ) VALUES ( 1 ); SELECT * FROM locTable;
CREATE GLOBAL TEMPORARY TABLE globTable ( Letter CHAR ) ON COMMIT PRESERVE ROWS; INSERT INTO globTable ( Letter ) VALUES ( 'A' ); SELECT * FROM globTable;
We will successfully create these two UDF functions.
CREATE OR REPLACE FUNCTION funcLocTable() RETURNS TABLE ( Number INTEGER ) BEGIN RETURN SELECT * FROM locTable; END;
CREATE OR REPLACE FUNCTION funcGlobTable() RETURNS TABLE ( Letter CHAR ) BEGIN RETURN SELECT * FROM globTable; END;
We can see bellow, that it is possible to use temporary tables inside of the custom functions.
SELECT * FROM funcLocTable();
SELECT * FROM funcGlobTable();
BEGIN ATOMIC
CREATE OR REPLACE FUNCTION funcAtomic() RETURNS INTEGER BEGIN START TRANSACTION; RETURN 2; COMMIT; END;
We can not use transaction statements inside of the UDF function. This is a problem because, inside of the one UDF function we can have several statements. We want all of them to succeed or all of them to fail.
CREATE OR REPLACE FUNCTION funcAtomic() RETURNS INTEGER BEGIN ATOMIC RETURN 2; END;
If we add the word "ATOMIC" after "BEGIN", we will turn our UDF into one transaction. "Atomic" means undividable.
Optimization of the UDF
SELECT funcAddTwo( Number ) FROM NumText;
We have already seen that a UDF function can be executed for each row of a table. If our UDF is complicated and the server cannot optimize it, executing the UDF on a large table may be inefficient. Then we should look for another way to accomplish our task.
START TRANSACTION; INSERT INTO Tab1 ( Num ) VALUES ( 5 ); UPDATE Tab1 SET Num = 6 WHERE Num = 5; DELETE FROM Tab1 WHERE Num = 6; COMMIT;
When a transaction occurs, we want to record all changes made by that transaction as quickly as possible. 1) We write to files that are called "Write Ahead Log" (WAL). These files are just temporary notes. 2) We write down only essential data. 3) We write sequentially onto disk. That is faster. 4) We buffer these notes in memory and we write them to disk in batches. 5) We do not recalculate indexes of any table.
After we have collected all the necessary data, we have analyzed it and are now ready to write it to the table. We will transfer temporary notes from the WAL to the permanent record. Changes are going to be applied to the database in one fast move. This brings us three advantages: a) The chances of something bad happening during the write, are minimized, because we are really fast and prepared. b) We can be sure that our transaction will be completed in full, or will not be completed at all. c) Until the end, no final decision is made, so If transaction has to be rollbacked, we just have to declare WAL notes invalid. We don't have to delete anything from the database.
But what if the system becomes corrupted and crash? We don't have to worry. We still have WAL. After we fix the system, we can read from the WAL about: a) Transactions that are complete and are not written to a table. We can write them after the system is repaired, from the WAL notes. This is how we provide durability. These transactions will not be lost. b) Transactions that are incomplete in the WAL will be discarded. We don't want incomplete transactions.
Unlogged Tables
Unlogged Tables are just like normal database tables, except they are not using WAL.
Normal tables are written like:
Unlogged tables are skipping WAL:
Unlogged tables are almost like normal tables: 1) They are written to disk. 2) After normal shutdown of a system, content of unlogged tables will be preserved. 3) They have transactions, but their transactions are not using WAL. Their transactions exist only in RAM memory. 4) Content of these tables is available to all of the users, just like for normal tables.
The only difference is in the case of the system crush. After the crush, content of normal tables will be restored to a consistent state by using WAL. Unlogged table will be truncated during the recovery. Server can not guarantee consistency of unlogged tables without WAL, so it will delete their content. This is why unlogged tables should be used only for temporary and re-creatable data.
Writing to unlogged tables can be several times faster then writing to normal tables. Without WAL, we can write much faster. We are sacrificing reliability of unlogged tables for better performance.
Unlogged Tables in MonetDB
I will login as administrator. I will change current schema to "voc".
This is how we create unlogged table in MonetDB. CREATE UNLOGGED TABLE UnLogTab ( Number INTEGER ); INSERT INTO UnLogTab ( Number ) VALUES ( 1 ); SELECT * FROM UnLogTab;
If we quit our session, and we log in again, we will still be able to use our unlogged table. quit mclient -u monetdb -d voc SELECT * FROM voc.UnLogTab;
Both the data and the table structure will be preserved. Unlogged table can last through several sessions.
I will create another session as a "voc" user.
mclient -u voc -d voc –password voc
This other session will also be able to read and modify UnLogTab table. UPDATE UnLogTab SET Number = 2; SELECT * FROM UnLogTab;
Sample Table
I will create one normal sample table in the "voc" schema, from the admin session: CREATE TABLE voc.Tab1 ( Letter VARCHAR(10) ); INSERT INTO voc.Tab1 VALUES ( 'A' ); SELECT * FROM voc.Tab1;
What Happens After the Crash?
I will exit both sessions from above. We are now in the shell. The number of mserver5 process is 1335. pgrep mserver5 –process 1335 I will kill this process. kill -9 1335
We have now violently closed our server.
I will again start my "voc" database and I will log in as administrator. monetdb start voc mclient -u monetdb -d voc
If we now read from our table, our table will be empty. Because of crush, MonetDB truncated this table.
SELECT * FROM voc.UnLogTab;
But, if we read from the sample table, everything will be fine: SELECT * FROM voc.Tab1;
Difference Between MonetDB and Some Other Servers
I will again add some data to UnLogTab: INSERT INTO voc.UnLogTab ( Number ) VALUES ( 1 ); SELECT * FROM voc.UnLogTab;
This time I will restart my server gracefully. quit monetdb stop voc monetdb start voc mcient -u monetdb -d voc –password monetdb
Our UnLogTab will be empty. This is opposite from some other servers, where regular restart doesn't remove the content of the unlogged tables. In MonetDB, the content will be lost. SELECT * FROM voc.UnLogTab;
ALTER TABLE Tab1 SET UNLOGGED; ALTER TABLE UnLogTab SET LOGGED;
Unlike some other databases, MonetDB doesn't have ability to transform unlogged tables to logged tables and vice versa. These statements will not work.
System Tables
We can find information about unlogged tables in the sys.tables and sys.statistics.
SELECT * FROM sys.tables WHERE name = 'unlogtab';
SELECT * FROM sys.statistics WHERE table = 'unlogtab';
In the system table sys.table_types we can see that unlogged tables are of the type 7.
In the warehouse, we don't pack all our goods into one huge box.
Instead, we organize our goods properly into sections, pallets, and boxes. This way, we can easily find, label, and manipulate our inventory.
In databases everything is automated. We don't have to worry about organization ourselves. The problem arises when our table becomes too big. Rebuilding indexes becomes slow, backup/restore is slow, fragmentation of a database is significant.
There is also the fact that in enterprise databases, most of the data is read-only, and only the most recent data is read/written. All this led us to the conclusion that parts of our table should be treated and processed separately. That is why we apply partitioning.
Partitioning is when we divide our table horizontally to several smaller tables.
Benefits and Drawbacks of Partitioning
Queries are faster. Instead of scanning the entire table, we will scan only the necessary partitions. The database is smart enough to discard partitions that do not have a relevant date. This is called "partition pruning". For example, to see sales only for the year 2024, we can query only the 2024 partition and ignore all the others.
Rebuilding indexes, updating statistics, vacuuming is easier for partitions.
Dropping, archiving, backing up, partition swapping, can be done on one part of the table. We can treat the parts of the table separately.
Partitions can be processed in parallel, on different CPU cores. Partitions can be on different storage disks.
Partitions with older/stable data can be compressed and can have multiple indexes. It is the opposite for the most recent data.
Partitioning is only really useful when we have really large tables. Large tables are those with over 100 million rows. The biggest benefit is in maintaining such large tables. It is questionable whether partitioning will improve query speeds. This will only happen if queries exclusively touch some of the partitions and not others. If there is a discrepancy between how users discriminate the data and how we have defined our partitions, we could reduce performance rather than improve it.
Simple Start
First, we will create merge table. It is not possible to query this table until we add some partitions to it.
CREATE MERGE TABLE Merg ( Letter VARCHAR(10), Number INT ); SELECT * FROM Merg;
Now, we will create two sample tables. We will attach these tables as partitions to "Merg" table. Sample tables should have the same exact definitions.
CREATE TABLE Tab1 ( Letter VARCHAR(10), Number INT ); INSERT INTO Tab1 (Letter, Number) VALUES ('A', 50), ('A', 60);
CREATE TABLE Tab2 ( Letter VARCHAR(10), Number INT ); INSERT INTO Tab2 (Letter, Number) VALUES ('B', 150), ('B', 160);
"Tab1" and "Tab2" will be attached to "Merge" table. ALTER TABLE Merg ADD TABLE Tab1; ALTER TABLE Merg ADD TABLE Tab2;
SELECT * FROM Merg;
We can see that merge tables are similar to union queries. UNION queries are verbose, while merge table queries are short and simple. UNION queries are more computationally intensive and use more memory. A merge table can effectively use indexes that are set up over individual partition tables.
On the other hand, UNION queries are necessary when the base tables have different structures that require transformation.
System Tables and Removing Partitions
This system table will show us partitions of our merge table. ID of the merge table is 11077.
SELECT * FROM sys.dependencies_vw WHERE used_by_id = 11077;
This is how we can remove partition from the Merge table. ALTER TABLE Merg DROP TABLE Tab1; SELECT * FROM Merg;
Merge table now has only one partition, and that is Tab2 partition.
Tab1 is only removed from the merge table. We still have Tab1 in our schema intact.
Problem With Simple Approach
If we try INSERT, UPDATE, DELETE, TRUNCATE on the merge table, we will fail. UPDATE Merg SET Number = 170 WHERE Letter = 'B';
We will delete our merge table, because we want to create it in a way that will allow INSERT, UPDATE, DELETE, TRUNCATE. DROP TABLE Merg;
This time we will provide merge table with a rule by which merge table will differentiate between partitions.
CREATE MERGE TABLE Merg ( Letter VARCHAR(10), Number INT ) PARTITION BY VALUES ON ( Letter );
We now have a system to help merge table to understand distribution of values between partitions. Merge table now knows that partitions are defined based on the values in the "Letter" column.
Next step is to add partitions. Again, we have to use PARTITON clause.
I will try to deceive merge table. I will try to plant table "Tab1" as a table that has the letter 'Z' in the "Letter" column. We know that "Tab1" only has letter "A" in the "Letter" column. ALTER TABLE Merg ADD TABLE Tab1 AS PARTITION IN ( 'Z' ); Merge table will reject this partition.
Only when I truthfully declare my partition as defined by the "A" in the "Letter" column, will my partition be accepted. ALTER TABLE Merg ADD TABLE Tab1 AS PARTITION IN ( 'A' );
But what if I have another table that only has "A" in the "Letter" column. I will create such table, and I will try to add it to the merge table.
CREATE TABLE Tab3 AS ( SELECT * FROM Tab1 ) WITH DATA; ALTER TABLE Merg ADD TABLE Tab3 AS PARTITION IN ( 'A' );
Now we have a conflict. Definitions of partitions have to be unique. "Tab3" will be rejected.
Partition With Multiple Values in the Letter Column
I will add one more row in the "Tab2" table. After that I will add "Tab2" to the merge table.
INSERT INTO Tab2 ( Letter, Number ) VALUES ( 'C', 170 ); SELECT * FROM Tab2; I will declare that my new partition has two letters in the "Letter" column. Merge table will accept this. ALTER TABLE Merg ADD TABLE Tab2 AS PARTITION IN ( 'B', 'C' ); This ability can help us if we don't have uniform distribution of the letters, but we want partitions to be of the same size.
Let's Try Modifying "Merg" Table Directly
Let us now try to INSERT a row directly into "Merg" table.
INSERT INTO Merg ( Letter, Number ) VALUES ( 'Z', 999 ); There is no "Z" partition, so this INSERT will be rejected.
INSERT INTO Merg ( Letter, Number ) VALUES ( 'A', 70 ); Success! "Merg" now knows where to insert a new row (into "Tab1").
Let's update this new row. UPDATE Merg SET Number = 71 WHERE Letter = 'A' AND Number = 70;
Let's delete this new row. DELETE FROM Merg WHERE Letter = 'A' AND Number = 71;
But what if I modify "Tab1" directly. Will that confuse "Merge" table? UPDATE Tab1 SET Letter = 'Z'; <= As we can see, merge table is protected from the rule violation.
Redefining A Partition
We have new records with the letter "Z", but we have only a few of them. I want to add them to "Tab1" partition. We know that "Tab1" will reject them.
In order to avoid that, I will redefine "Tab1" to accept "Z" record. ALTER TABLE Merg SET TABLE Tab1 AS PARTITION IN ( 'A', 'Z' );
Let's insert now "Z" record into "Tab1". INSERT INTO Merg ( Letter, Number ) VALUES ( 'Z', 70 ); "Merg" table will now accept "Z" record.
Other Ways How to Define Partitioning Rule
Partition By Range
This is the current state of table "Tab1" and "Tab2". We can notice that "Tab1" has values in the range "1-100", and "Tab2" has values in the range "101-200". We will use that fact to create new merge table based on these values.
CREATE MERGE TABLE MergRange ( Letter VARCHAR(10), Number INT ) PARTITION BY RANGE ON ( Number );
We'll add "Tab1" and "Tab2" to this new "MergRange". Problem is that one table can not be part of several merge tables.
ALTER TABLE MergRange ADD TABLE Tab1 AS PARTITION FROM 1 TO 100;
We will first remove "Tab1" and "Tab2" from the "Merg" table, and then we will add them to the "MergRange" table.
ALTER TABLE Merg DROP TABLE Tab1; ALTER TABLE Merg DROP TABLE Tab2;
ALTER TABLE MergRange ADD TABLE Tab1 AS PARTITION FROM 1 TO 100; ALTER TABLE MergRange ADD TABLE Tab2 AS PARTITION FROM 101 TO 200;
This would be our new merge table. SELECT * FROM MergRange;
Note, that we would get the same result with these definitions. ALTER TABLE MergRange ADD TABLE Tab1 AS PARTITION FROM RANGE MINVALUE TO 100; ALTER TABLE MergRange ADD TABLE Tab2 AS PARTITION FROM 101 TO RANGE MAXVALUE;
Partition By Value Expression
So far, we have only defined partitions using a single column. Now we will use expression to define partitions. Expression "Letter || CAST( Number AS VARCHAR(10))" says that columns "Number" and "Letter", together, define partition.
CREATE MERGE TABLE MergExpression ( Letter VARCHAR(10), Number INT ) PARTITION BY VALUES USING ( Letter || CAST( Number AS VARCHAR(10) ));
We will remove "Tab1" and "Tab2" from the merge table "MergRange". Then, we will add them to the "MergExpression" table.
ALTER TABLE MergRange DROP TABLE Tab1; ALTER TABLE MergRange DROP TABLE Tab2;
ALTER TABLE MergExpression ADD TABLE Tab1 AS PARTITION IN ( 'A50', 'A60', 'Z70' ); ALTER TABLE MergExpression ADD TABLE Tab2 AS PARTITION IN ( 'B150', 'B160', 'C170' );
We get the same result again.
SELECT * FROM MergExpression;
Partition by Range Expression
It is also possible to use an expression to calculate the value that will be used to determine range membership.
CREATE MERGE TABLE MergRangeExpression ( Letter VARCHAR(10), Number INT ) PARTITION BY RANGE USING ( Number + char_length( Letter ) );
Again, we will untie our tables from the previous merge table, and then we will add them to the MergRangeExpression table.
ALTER TABLE MergExpression DROP TABLE Tab1; ALTER TABLE MergExpression DROP TABLE Tab2;
ALTER TABLE MergRangeExpression ADD TABLE Tab1 AS PARTITION FROM 1 TO 100; ALTER TABLE MergRangeExpression ADD TABLE Tab2 AS PARTITION FROM 101 TO 200;
Partition By NULLS
We have to declare what partition will have nulls. Obviously we have to place all the NULLS into only one partition.
ALTER TABLE Merg ADD TABLE Tab1 AS PARTITION IN ( 'A' ) WITH NULL VALUES;
In this case all nulls would belong to partition "Tab1".
ALTER TABLE Merg ADD TABLE Tab2 AS PARTITION FROM 1 TO 9 WITH NULL VALUES;
All nulls belong to partition "Tab2".
ALTER TABLE Merg ADD TABLE Tab3 AS PARTITION FOR NULL VALUES;
In this case, all nulls belong to partition "Tab3".
PARTITION System Tables
When we define partitioning rule (when we use PARTITION clause), that partition rule will be register in these system tables.
Here we can find our three merge tables. Two of them are using individual columns, and the third one is using an expression. SELECT * FROM sys.table_partitions;
Here we can find values used to determine partitions.
SELECT * FROM sys.value_partitions;
System table "sys.range_partitions" is used when partitioning is made by the ranges ( 1-100, 101-200 ).
Merge Table Based on Another Table
It is possible to give merge table definition from some other table. WITH NO DATA is mandatory. CREATE MERGE TABLE MergAS ( Letter, Number ) AS ( SELECT * FROM Tab1 ) WITH NO DATA;
We will login as adminstrators, but we will set VOC shema as default. Password is "monetdb".
mclient -u monetdb -d voc; SET SCHEMA voc;
CREATE TABLE permTable ( Number INTEGER ); INSERT INTO permTable ( Number ) VALUES ( 1 ), ( 2 );
Temporary Tables
After the execution of the query, if the result is not saved in a table or sent to an application, the result of a query will be discarded. Queries are transient, but tables are permanent. Tables will permanently save data stored in them. Between queries and tables, we have temporary data structures called temporary tables. These structures are used to store session-specific data that only needs to exist for a short duration.
Creation of a Local Temporary Table
We will create a temporary table that will exist only during one session. Such temporary tables are called LOCAL temporary tables.Default behavior of temporary tables is to lose their content at the end of transaction. We can prevent that with option on commit PRESERVE ROWS. We don't want the temporary table to be emptied at the end of the transaction because we want to observe the behavior of the table.
CREATE LOCAL TEMPORARY TABLE tempTable ( Number INTEGER PRIMARY KEY ) ON COMMIT PRESERVE ROWS; INSERT INTO tempTable ( Number ) VALUES ( 3 ), ( 4 ); SELECT * FROM tempTable;
Temporary tables are created in "tmp" schema. We don't have to prefix their names with "tmp". Both statements bellow will work the same SELECT * FROM tempTable; SELECT * FROM tmp.tempTable;
We can not create temporary table in some schema other than "tmp". CREATE LOCAL TEMPORARY TABLE voc.tempTable2 ( Number INTEGER PRIMARY KEY );
Permanent table and temporary table can have the same name because they are in different schemas. I will create one temporary table with the name "permTable". CREATE LOCAL TEMPORARY TABLE permTable ( Number INTEGER ); If we read from the "permTable" without specifying schema, we will get the temporary table. SELECT * FROM permTable; For reading from the permanent table with the same name, we have to use fully qualified name. SELECT * FROM voc.permTable; So, temporary table has priority, although our current schema is "voc" (and not "tmp").
We can not create permanent objects in "tmp". SET SCHEMA tmp; CREATE TABLE ZZZ ( Number INTEGER ); SET SCHEMA voc;
Usage of a Temporary Table
It is possible to create queries that combine temporary and normal tables. SELECT * FROM voc.permTable JOIN tempTable ON permTable.Number <= tempTable.Number;
We can have constraints on the temporary table. In this case we have PK constraint. INSERT INTO tempTable ( Number ) VALUES ( 3 );
It is possible to export data from temporary table into CSV. COPY SELECT * FROM tempTable INTO '/home/fffovde/Desktop/tblCSV';
It is possible to import data from a CSV file into temporary table. COPY INTO tempTable ( Number ) FROM '/home/fffovde/Desktop/tblCSV'( Number );
It is possible to use UPDATE and DELETE on temporary tables.
UPDATE tempTable SET Number = 6 WHERE Number = 5; DELETE FROM tempTable WHERE Number = 6;
We can NOT alter our temporary table. ALTER TABLE tempTable ADD COLUMN Letter VARCHAR(10);
ALTER TABLE tempTable DROP CONSTRAINT temptable_number_pk;
It is possible to create a view on a temporary table. CREATE VIEW voc.View1 ( Number ) AS SELECT * FROM tmp.tempTable; SELECT * FROM View1;
It is not possible to create foreign key constraint on the permTable if it references tempTable. ALTER TABLE voc.permTable ADD CONSTRAINT FromTempTableConstraint FOREIGN KEY ( Number ) REFERENCES tmp.tempTable ( Number );
Info About Temporary Tables
We can NOT get statistics about our temporary table. SELECT * FROM sys.statistics( 'tmp','temptable');
We can find our temporary table in the system catalog. SELECT * FROM sys.tables WHERE Name = 'temptable';
Visibility of Local Temporary Table
I will log in to MonetDB from another instance of mclient, as "voc" user (password "voc"). mclient -u voc -d voc I will try to read local table "tempTable". SELECT * FROM tempTable; This will not work, because local temporary table are visible only in the session where they are created. Other users can not see these tables.
As the user "voc" I will create temporary table with the same name -"tempTable". This will be successful. Each user can have its own local temp table. CREATE LOCAL TEMPORARY TABLE tempTable ( Number INTEGER );
Even if we don't explicitly drop the table, our table will disappear after we log out of the current session. I'll log out of the session (monetdb user session), and after I log back in, "tempTable" will no longer exist. quit mclient -u monetdb -d voc SELECT * FROM tempTable;
We can terminate our "tempTable" by dropping it explicitly, even before the end of the session.
DROP TABLE tempTable;
ON COMMIT DELETE ROWS
"ON COMMIT DELETE ROWS" subclause means that after each transaction, data will be deleted. This is default behavior.
CREATE LOCAL TEMPORARY TABLE tempTable ( Number INTEGER PRIMARY KEY ) ON COMMIT DELETE ROWS; INSERT INTO tempTable ( Number ) VALUES ( 3 ), ( 4 ); SELECT * FROM tempTable;
In this case, temporary table is useful only inside of transaction. START TRANSACTION; INSERT INTO tempTable ( Number ) VALUES ( 3 ), ( 4 ); SELECT * FROM tempTable;
COMMIT;
AS SELECT
We can create temporary table based on some other table. Base table can be temporary or normal table.
CREATE LOCAL TEMPORARY TABLE tempTable2 ( Number ) AS ( SELECT Number FROM tempTable );
CREATE LOCAL TEMPORARY TABLE tempTable3 ( Number ) AS (SELECT Number FROM voc.permTable ) ON COMMIT PRESERVE ROWS;
By defult we will use "WITH DATA". If try to read from tempTable3, we'll see the values. SELECT * FROM tempTable3;
The "WITH NO DATA" subclause will make us not to inherit data. CREATE LOCAL TEMPORARY TABLE tempTable4 ( Number ) AS ( SELECT Number FROM voc.permTable ) WITH NO DATA ON COMMIT PRESERVE ROWS; SELECT * FROM tempTable4;
Global Temporary Tables
Global temporary tables are somewhere between normal tables and local tables. Their definition ( columns and data types ) is permanent. Name of the global table has to be unique in the "tmp" schema. Only users with authorization over "tmp" schema can create global temporary tables. In our example, administrator "monetdb" can create global temporary tables, but "voc" user can not.
The thing that makes these tables temporary is their data. All the rows of the global temporary tables will be deleted after each transaction (for ON COMMIT PRESERVE ROWS) or after the session (ON COMMIT DELETE ROWS).
While definition of the global temporary tables is shared, data is not. Data placed in the global table by one user can not be seen by another user. So, global temporary table is a playground where each user can play with his own data.
Global temporary tables have similar characteristics as local temporary tables. We can use SELECT, DELETE, UPDATE. We can export them to CSV file. We can NOT alter global tables. We can create views on them. So, everything is the same as for local temporary tables.
It is possible to get statistics about global tables. SELECT * FROM sys.statistics( 'tmp','globtable');
Creation of the Global Temporary Table
We create global temporary table with similar statement as for the local temporary tables.
CREATE GLOBAL TEMPORARY TABLE globTable ( Number INTEGER PRIMARY KEY ) ON COMMIT PRESERVE ROWS;
This will fail for the "voc" user who doesn't have enough privileges over "tmp" schema.
Privileged users can successfully create global temporary table, but not if the table with such name already exist. It is not possible for two users to create global tables with the same names.
Visibility of Global Temporary Table
We will insert some data in our global temporary table. INSERT INTO globTable ( Number ) VALUES ( 5 ), ( 6 ); SELECT * FROM globTable;
If we try to read our table from the session of the "voc" user, we will see empty table. This show us that definition of a table is shared, but data is not shared. SELECT * FROM globTable;
Although "voc" user can not create global table, it can use tables created by others. "Voc" user can play with his own data. INSERT INTO globTable ( Number ) VALUES ( 7 ), ( 8 ); SELECT * FROM globTable;
When to Use Temporary Tables
You can create an excerpt from some big table. After that, you can run you queries on that smaller table, instead of the big one.
Because temporary tables are volatile and data is isolated between users, so temporary tables are great for experiments.
Temporary tables should not be used as an intermediate step in queries. In that case, it is much wiser to use CTE.