This is how we can change the name of our table. ALTER TABLE tblTurnover RENAME TO tblIncome; SELECT * FROM tblIncome;
We can change the name of a column. ALTER TABLE tblIncome RENAME COLUMN Turnover to Income; SELECT * FROM tblIncome;
We can alter our column to accept nulls. ALTER TABLE tblIncome ALTER COLUMN Income SET NULL;
Beside READ ONLY and READ WRITE, we also have INSERT ONLY. ALTER TABLE tblCost SET INSERT ONLY;
If we try to delete a row from INSERT ONLY table, that will be prohibited.
We can change default value for some column: ALTER TABLE tblIncome ALTER COLUMN Income SET DEFAULT 200;
We can delete DEFAULT value for Income column: ALTER TABLE tblIncome ALTER COLUMN Income DROP DEFAULT;
We can add new column in tblCost table. ALTER TABLE tblCost ADD COLUMN NewColumn BIGINT DEFAULT 88; SELECT * FROM tblCost; If we hadn't provided default value, then this new column would be filled with nulls.
We will delete our new column: ALTER TABLE tblCost DROP COLUMN NewColumn;
Both of our tables are created in VOC schema. We can move tblCost table to sys schema , and we can move it back: ALTER TABLE tblCost SET SCHEMA voc; ALTER TABLE tblCost SET SCHEMA sys; We can check with "\d" command that our table is transferred to some other schema.
Removing Constraints
If want to remove some constraint (primary key, unique constraint, foreign key), we have to know the name of that constraint. In our example we have placed PRIMARY KEY constraint on the tblIncome table, column ID, and UNIQUE constraint on the tblCost table, column ID . We will first read from system table sys.tables to find what are IDs of our tables:
SELECT * FROM sys.tables WHERE name IN ('tblcost', 'tblincome');
Now that we know that our tables have IDs 13938 and 13946, we will search for our columns in the table sys.columns. Our columns have IDs 13932 and 13940.
SELECT * FROM sys.columns WHERE table_id IN ( 13938, 13946 ) AND name = 'id';
Next step is to find names of our constraints. We will find them in the "dependency_columns_on_keys" table. We will filter this table by our tables and columns.
SELECT * FROM dependency_columns_on_keys WHERE table_id IN ( 13938, 13946 ) AND column_id IN ( 13940, 13932 );
We can now create a query that will return the names of the constraints on the specific columns:
SELECT T.id, T.name, C.id, C.name, K.key_id, K.key_name FROM sys.tables T INNER JOIN sys.columns C ON T.id = C.table_id INNER JOIN dependency_columns_on_keys K ON C.id = K.column_id AND C.table_id = K.table_id WHERE T.name IN ( 'tblincome', 'tblcost' ) AND C.name = 'id';
Now that we know our constraints names, we can delete them: ALTER TABLE tblIncome DROP CONSTRAINT tblturnover_id_pkey; ALTER TABLE tblCost DROP CONSTRAINT tblcost_id_unique;
Constraint on Several columns
When we want to make constraint that encompass several columns, then we have to make a constraint on a table itself. In that case, our CREATE TABLE statement would be like: CREATE TABLE Tab1 ( Col1 VARCHAR(10), Col2 VARCHAR(10), Col3 VARCHAR(10), PRIMARY KEY ( Col1, Col2 ), UNIQUE ( Col2, Col3 ));
We will now create PRIMARY KEY and UNIQUE constraints on our tables by using ALTER TABLE statement: ALTER TABLE tblIncome ADD PRIMARY KEY ( ID, SubID ); ALTER TABLE tblCost ADD UNIQUE ( ID, SubID );
If we now run the statement that will return constraints names, we will get four rows as a result: SELECT T.id, T.name, C.id, C.name, K.key_id, K.key_name FROM sys.tables T INNER JOIN sys.columns C ON T.id = C.table_id INNER JOIN dependency_columns_on_keys K ON C.id = K.column_id AND C.table_id = K.table_id WHERE T.name IN ( 'tblincome', 'tblcost' ) AND C.name IN ( 'id', 'subid' );
Naming of Constraints
We saw earlier that the name of constraint is created automatically by concatenating a) table name, b) column name, c) constraint type ( tblturnover_id_pkey ). If the constraint encompasses several columns, for example id and subid, then the name will be like tblturnover_id_subid_ pkey. If we want to give our constraint a name then we have to use this syntax in CREATE TABLE statement:
We will now give names to our constraints by using ALTER TABLE statement. We can not create another primary key constraint because table can only have one primary key constraint.
On the other side, it is possible to have many UNIQUE constraints on one table. ALTER TABLE tblCost ADD CONSTRAINT Uniquous UNIQUE ( ID, SubID );
Now we have three constraints. We still have old UNIQUE constraint, and we have two new constraints with custom names. SELECT T.id, T.name, C.id, C.name, K.key_id, K.key_name, K.key_type FROM sys.tables T INNER JOIN sys.columns C ON T.id = C.table_id INNER JOIN dependency_columns_on_keys K ON C.id = K.column_id AND C.table_id = K.table_id WHERE T.name IN ( 'tblincome', 'tblcost' ) AND C.name IN ( 'id', 'subid' );
On the image above, we have a new column "key_type". Zero in this column means that we are dealing with primary key constraint, and 1 is unique constraint.
In the table sys.keytypes we can see that there are 4 types of constraints. CHECK constraint is not supported in MonetDB.
SELECT * FROM sys.key_types;
Foreign Key Constraint
We know that in Excel, "data validation" allows us to restrict that the user can only enter values (2) from a predefined list (1). Similarly, on the image below, the "key_type" column can only accept values 0,1,2,3,4 because those are the only values listed in the main table sis.key_types (3). In relational databases, a list (1) is located within the Primary Key column (4) of the table that performs the function of a "Parent Table". The list (2), where we are limiting what to enter, is in the "Child Table", in the column declared as a "Foreign Key" (5).
This "primary key" > "foreign key" relation is called foreign key constraint, because "primary key" column defines what can be entered into "foreign key" column. When creating a table, we should use a syntax like this one to create foreign key constraint:
We can omit the part ", CONSTRAINT PkFk", but in that case our constraint will get the default name " tab3_col1_col2_fkey".
We can find data about foreign key constraints in this system table. 14009 is Tab2 which has primary key 14277 "pkconstraint". On the other sid is Tab3 with ID 14294, which has foreign key 14292 pkfk. SELECT * FROM sys.dependency_keys_on_foreignkeys;
In order to find what columns are included in some foreign key constraint, we should read from this system table. SELECT * FROM describe_foreign_keys;
We can find everything about FK constraints in the table Tab3 with this SELECT query: SELECT pk_t PkTable, pk_c PkColumn, key_name PkName , fk_t FkTable, fk_c FkColumn, fk_name FkName FROM sys.dependency_keys_on_foreignkeys Fk INNER JOIN describe_foreign_keys DFk ON Fk.fk_name = DFk.fk WHERE fk_t = 'tab3';
We will now add foreign key constraints with ALTER TABLE statement. We will try to link "id" columns. This will fail. Foreign key constraint can only relate to column ( or columns ) that are declared as primary key or as unique columns.
ALTER TABLE tblCost ADD CONSTRAINT FromTblIncomeConstraint FOREIGN KEY ( id ) REFERENCES tblIncome ( id );
We will alter our statement to relate to primary key in the table tblIncome. Current primary key in the table tblIncome is composite key ( id, subid ). We will use that.
Foreign key constraint now limits what can be placed into tblCost table.
We can not add "4" into ID column in the table tblCost. That would violate the foreign key constraint.
We can not delete any row from the table tblIncome because each row is referenced in the tblCost table, and that would, again, violate the foreign key constraint.
RESTRICT / CASCADE when Deleting Constraints and Columns
We will try to delete ID column in tblCost table. ALTER TABLE tblCost DROP COLUMN ID;
We will try to delete primary key constraint in the tblIncome table. ALTER TABLE tblIncome DROP CONSTRAINT Primarius;
We will fail big. This is because the default mode is RESTRICT. By using CASCADE, we would be able to delete this column and this constraint.
START TRANSACTION; ALTER TABLE tblIncome DROP CONSTRAINT Primarius CASCADE; ROLLBACK;
If we rollback our transaction, we want really delete the constraint. I don't want to delete it.
NULL Handling
We can not insert NULL into primary key column. Primary key column doesn't allow nulls.
INSERT INTO tblIncome ( ID, SubID, IDwithNULL, Income ) VALUES ( null, 4, null, 104 );
tblIncome
tblCost
Column IDwithNULL in tblIncome table has one field with null. Constraint UNIQUE NULLS NOT DISTINCT can be added to this column, because there will be no duplicate nulls (we only have single null). ALTER TABLE tblIncome ADD CONSTRAINT IncomeUniqueNull UNIQUE NULLS NOT DISTINCT ( IDwithNULL );
Column IDwithNULL in tblCost table has two fields with nulls. Suprise is that UNIQUE NULLS NOT DISTINCT constraint can be added to this column. ALTER TABLE tblCost ADD CONSTRAINT CostUniqueNull UNIQUE NULLS NOT DISTINCT ( IDwithNULL );
This constraint will only have effect when we try to add new null: INSERT INTO tblIncome ( id, subid, idwithnull, income ) VALUES ( 4, 4, null, 50 );
The default value for this subclause is NULLS DISTINCT. By default, we will consider all nulls to be distinct.
What is Not Supported in Foreign Key Constraint
For foreign key constraint, match can be only SIMPLE, on update can only be RESTRICT, on delete can only be RESTRICT. Other options are not supported so we will ignore them.
The ON UPDATE RESTRICT and ON DELETE RESTRICT subclauses mean that it is not possible to update or delete a key in the parent table if there is a row in the child table that referenced that key.
MATCH SIMPLE means that if a foreign key is a composite key ( it is made of several columns ), and if one of the columns in the foreign key has a null, then the referential integrity will not be checked for that foreign key.
Error Avoidance
As always, if we use IF NOT EXISTS or IF EXISTS, we will avoid errors. We will always get "operation successful".
CREATE TABLE IF NOT EXISTS employees ( id INT PRIMARY KEY, name VARCHAR(100), department VARCHAR(50), salary DECIMAL(10, 2) );
ALTER TABLE IF EXISTS employees ADD COLUMN hire_date DATE;
Schema is logical container for objects like tables, views, functions, indices, procedures, triggers….
← On one server machine we can install several server applications. For example, on one server machine we can install SQL Server and MonetDB server (or we can have several instances of the same server application). In SQL Server, objects are organized in the fashion: Server Machine > Server Application > Database > Schema > Object. In MonetDB, there are no databases, so hierarchy is: Server Machine > Server Application > Schema > Object.
Separate server application (like SQL Server and MonetDB on the image) are totally independent. Databases are almost like that:
This is what differentiate databases: – Each database is a separate process. – Each database has its own system tables and storage structures. – For each database we have to make a separate connection. – We can not query tables that belong to different databases.
This is what connects databases: – Some settings can be defined on the server application level (CPU, memory, disk quotas, cache size). – You access databases through the same Hostname/IP Address.
Schemas are not like databases. Schemas are just a way to organize our objects. We benefit from schemas because:
Schemas allow us to organize our objects.
Schemas allow us to differently set security and access control.
Schemas allow us to have objects that have the same name, but they have to be in the separate schemas ( MySchema.TableName, YourSchema.TableName ).
We can query tables from different schemas with one SELECT statement, but only if we have access rights.
Schemas in MonetDB
System table sys.schemas has a list of all of our schemas. SELECT * FROM sys.schemas;
Each server will have system schemas, like sys, tmp, json, profiler, logging, information_schema. On the image, we can see that the only user schema is "voc" schema.
In one moment, only one schema can be current schema. That means: – We can call objects from that schema without specifying schema name (TableName instead of MySchema.TableName). – Table created, without specifying schema, will be created in the current schema.
We can get the name of the current schema with statement: SELECT CURRENT_SCHEMA;
We can create new schema with statement: CREATE SCHEMA newSchema;
This will fail because only members of monetdb group can create schemas.
Creation of the New Schema
We will exit "mclient" session in order to login as administrators. Now we can create some new schemas. mclient -u monetdb -d voc REMEMBER: administrators group password is "monetdb".
Notice that, when logged as administrators, our current schema will be "sys" schema. We will change our current schema to VOC schema by using statement: SET SCHEMA VOC;
* mclient -u monetdb -d voc VOC provided during logon is for the database server. This is not schema name. It is server name. We log in into the server.
We will create schema VOC2. CREATE SCHEMA VOC2; After that we will create table "NewTable". This table will be created in the current schema (VOC). CREATE TABLE NewTable (Text VARCHAR(10));
This SELECT statement will fail. Our current schema is VOC. There is no table "NewTable" in VOC2 schema. SELECT * FROM VOC2.NewTable;
We have explicitly to create table inside of the VOC2 schema. Only then we can read from this table. CREATE TABLE VOC2.NewTable (Text VARCHAR(10)); SELECT * FROM VOC2.NewTable; Notice that we now have two tables with the same name. VOC.NewTable VOC2.NewTable
We will write something into our two tables so we can make a distinction between them:
INSERT INTO NewTable( Text ) VALUES ( 'VOC');
INSERT INTO VOC2.NewTable( Text ) VALUES ( 'VOC2');
If we read from the NewTable, we will read from our current schema. SELECT * FROM NewTable;
We have to be explicit to read from VOC2 schema. SELECT * FROM VOC2.NewTable;
We can change current schema with this statement: SET SCHEMA VOC2; This time we will read VOC2.NewTable, and not VOC.NewTable. SELECT * FROM NewTable;
As administrators, we can read from different schemas with one SELECT query: SELECT * FROM VOC.NewTable UNION ALL SELECT * FROM VOC2.NewTable;
If we try the same from VOC account, we would get an error: SELECT: access denied for voc to table 'voc2.newtable'
Ownership of a Schema
We can find owner of the schema by using an information schema view. Administrators are the only ones who can create schemas. That is why we always have "monetdb" in the schema_owner column.
SELECT * FROM information_schema.schemata;
Authorization of a Schema
We know that the user "voc" has right to use "voc" schema. This is something that we can find in sys.schemas table. In the column "authorization" we can see number 7109. What does that number represent?
SELECT * FROM sys.schemas;
Each user is represented by their default role. This default role is created when a user is created. The default role is bound to one user and cannot have other users. User "voc" has a default role with ID 7109. We can see this in the system table sys.users. The schema "voc" has authorization 7109. The conclusion is that the user "voc" is the only user who can create and modify and read objects in this schema (besides the administrators).
SELECT name, fullname, default_role, default_schema FROM sys.users;
Notice above, that user voc has the default schema 7110*, and role monetdb has the default schema 2000*. When we log in as such users, this will be our initial current schema.
If we go back to one of the first posts in this series, we will find that we gave authorization to the user " VOC" during creation of VOC schema: CREATE SCHEMA "VOC" AUTHORIZATION "VOC";
During creation of the schema VOC2, we haven't authorized anyone. In that case authority will belong to "MonetDB Admin". We can see that in images above. Authorization for VOC2 schema belogs to default role 3, and that is monetdb group of adiminstrators.
After the keyword AUTHORIZATION, we can have either a role or a user name. Only one username or role can be authorized. If we want for several users to control some schema, then we have to give the authorization to the role to which those users belong. After creation of a schema, it is not possible to change its authorization. Because of this, it is always better to give authorization to a role, than to a user. Afterwards we can just add or remove users to that role.
Easy Creation of a Schema for Some Role/User
We don't have to provide name for a schema, just authorization. In that case the name of the group/user which have authorization will be also the name of a schema.
CREATE SCHEMA AUTHORIZATION MonetDB;
We created a schema authorized to MonetDB group. In the image on the right side, we can see that now we have a schema with a name "monetdb".
Renaming of a Schema
ALTER SCHEMA voc2 RENAME TO voc3;
We can only rename schema if there are no objects that depends on the name of that schema.
We will now create a VIEW, an object that is dependent on schema voc3. CREATE VIEW aaa AS SELECT * FROM voc3.NewTable;
Because of that view, it is no more possible to rename this schema. ALTER SCHEMA voc3 RENAME TO voc2;
Deleting a Schema
We will first change current schema to VOC, because it is not possible to delete current schema: SET SCHEMA VOC;
DROP SCHEMA voc3; --We can not delete schema because there is a view that depends on that schema.
We will delete our View. DROP VIEW VOC3.aaa;
But it is still impossible to delete schema, because there is the table "New Table" in it.
In MonetDB, we can not easily delete the schema because RESTRICT is the default mode: DROP SCHEMA voc3 RESTRICT; We have to supply the keyword CASCADE to easily delete schema. This means that schema will be deleted together with all of the dependent objects (tables and views): DROP SCHEMA voc3 CASCADE;
Avoiding Errors
CREATE SCHEMA IF NOT EXISTS SchemaName; DROP SCHEMA IF EXISTS SchemaName;
By using "IF NOT EXISTS" and "IF EXISTS", we can avoid getting error messages.
CREATE TABLE LetterNumber( Letter CHAR, Number INTEGER );
INSERT INTO LetterNumber( Letter, Number ) VALUES ( 'A', 1 ), ( 'B', 2 ), ( 'C', 3 );
Views in MonetDB
Views are named and saved queries. First, we make a SELECT query, we give it a name, then we save it. Whenever we need the logic encapsulated in that query, we can get it by calling the name of a saved view. The name of a view can be now used at any place where we can use a name of a table. Views are often called "virtual tables".
Advantages of views: – Reuse logic. This also improves consistency. – Make logic modular. – Can be used to control what users can see and what can not see. – Views can be used as intermediary between database and application. This will reduce interdependence.
Limitations of views: – Changes to underlaying base tables can invalidate views. – Proliferation of views that are mutually referenced can lead to complex structures and increased interdependency.
CREATE VIEW voc.View1 ( Letter1, Number1 ) AS SELECT Letter, Number FROM LetterNumber WITH CHECK OPTION;
This is how we create a view. voc is the name of a schema, and View1 is the name of a view. After AS keyword, we type SELECT statement. Before the keyword AS, we type column aliases. WITH CHECK OPTION is allowed subclause, but it has no affect in MonetDB.
We can read from the view like from a table: SELECT * FROM View1;
CREATE OR REPLACE VIEW voc.View1 ( Number2, Letter2 ) AS SELECT Number, Letter FROM LetterNumber;
Subclause "OR REPLACE" can be used to change already saved view. In our example, we changed the order of columns in the SELECT statement, we also changed columns aliases.
Now, the result of our view will be different.
Information_schemas.Views
Each database has its own system tables where it stores database metadata. These tables are different among different database servers. In order to improve standardization, SQL committee introduced standardized set of views, called "information_schemas". These views allow us to query different databases using the same queries, and to get the same metadata. These queries below will work in MonetDB, Postgres and many other databases.
SELECT schema_name, schema_owner FROM information_schema.schemata;
SELECT table_name, table_type FROM information_schema.tables;
We are now interested in the view "Information_schemas.views". We can get informations about our view with the query:
SELECT table_name, view_definition FROM information_schema.views WHERE table_name = 'view1';
There are many more columns in this "information_schema.views" view, but we will talk about them another time.
Droping the View
It is possible to create a view based on a view. Now we have a database object that is dependent on the View1.
CREATE VIEW View2 ( Letter3 ) AS SELECT Letter2 FROM View1;
We will try to delete View1 with this statement, but we will fail. The reason is that View2 is dependent on the View1: DROP VIEW View1;
We can solve this in two ways. We can first delete View2 and then View1. →
DROP VIEW View2; DROP VIEW View1;
Other solution is to use CASCADE subclause. DROP VIEW View1 CASCADE;
CASCADE keyword will delete View1 and all of the objects that are dependent on the view View1 (that would be View2).
If we now try to delete View1, we would get an error. DROP VIEW View1;
We can avoid this error with IF EXISTS subclause. With this subclause we'll always get "operation successful". DROP VIEWIF EXISTS View1;
What are Indexes?
← At the end of a book, we have an index. An index will tell us where to look for content about a term. In the index, the terms are arranged in alphabetical order, so we can easily find a word and then look up the pages where it is mentioned in the book.
↓ It's the same with databases. Indexes will help us find our rows. In the example bellow, it is much easier to search through 2 indexes, than through 5 rows of the table. This will improve our performance.
Indexes will speed up data reading. Indexes must be updated when data is modified, which can slow down INSERT, UPDATE, and DELETE operations. We read data much more often than we write it, so indexes are a good approach to make our database more performant.
How to Create Indexes in MonetDB?
Great thing is that we do not have to. MonetDB will create optimal indexes for us. The user is still free to create indexes manually but those indexes will only be considered as suggestions by the MonetDB. MonetDB can freely neglect user's indexes, if it finds a better approach.
CREATE INDEX Ind1 ON voc.LetterNumber ( Letter );
Voc.LetterNumber is SchemaName.TableName. The index name must be unique within the schema to which the table belongs.
CREATE UNIQUE INDEX Ind1 ON voc.LetterNumber ( Letter, Number );
Key word UNIQUE is allowed, but it has no affect.
We can set an index on several columns. A separate index entry will then be created for each unique combination of values from those columns. →
This index will fail because we already have an index with the name Ind1.
We can have several indexes on one table or column.
These is no sense in creating several indexes, of the same type, on the same column, so we should avoid doing it.
In sys.idxs system table we can find a list of our indexes.
SELECT * FROM sys.idxs;
This is how to remove index: DROP INDEX Ind1;
All indexes are removed with this statement, including special indexes ( Imprints and Ordered index ).
MonetDB has two special indexes: Imprints and Ordered index. These indexes are experimental and we should avoid using them.
Special Index: Ordered index
An ordered (or clustered) index is an index where the items in the index are sorted. This makes searching through the index, filtering item ranges, and sorting table data much faster.
There are some limitations on this index: – We can use only one column for each index. – After UPDATE, DELETE or INSERT, on the table, this index will become inactive.
The problem is that these limitations are only according to MonetDB documentation. MonetDB will NOT enforce these limitations. Below we can see an Ordered index which is using several columns, which shouldn't be allowed. This is probably the result of the fact that this index is experimental and not fully implemented.
CREATE ORDERED INDEX Ind1 ON voc.LetterNumber( Letter, Number );
When we create this index, it will appear in the sys.idxs table, but we can not be sure whether it is active or not. It seems that the best bet to make this index active is to:
Make the table READ ONLY.
ALTER TABLE LetterNumber SET READ ONLY;
ALTER TABLE LetterNumber SET READ WRITE;
Create Ordered index on only one column.
Don't make any further changes to that table.
Creation of this index is expensive, so we should create it ad hoc on the READ ONLY tables.
I will delete the index we have created.
DROP INDEX Ind1;
Special Index: Imprints Index
Imprints index is similar to Ordered index. It is experimental and not fully implemented. Some of the limitations I will talk about are not enforced by MonetDB, and they are just theoretical limitations:
Imprints index can only be implemented on numerical and string columns.
After UPDATE, DELETE or INSERT, on the table, this index will become inactive.
One Imprints index can only be implemented on one column.
This index will most likely work if we apply the same steps as for the Ordered index ( READ ONLY table, one column in index, don't change nothing else ).
The idea of the Imprints index on numerical column is to divide that column into segments, and then to store some metadata for each segment. For example, we can store minimum and maximum value for the values in one segment. If someone applies a filter "VALUE > 30", we will be able to avoid searching through segments where the maximum value is 30 or less. This will speed up our filters.
CREATE IMPRINTS INDEX Ind1 ON LetterNumber( Number );
The idea of the Imprints index on a string column is to make LIKE filters faster. This index would make possible to prefilter our column by using fast but not totally accurate algorithm. After that, we can apply correct algorithm on the already reduced set of data.
In our example, Imprints index on a string column will fail, MonetDB will enforce the READ ONLY requirement this time.
CREATE IMPRINTS INDEX Ind1 ON LetterNumber( Letter);
After applying "ALTER TABLE LetterNumber SET READ ONLY;", our statement will still fail, because it doesn't work on the columns that have less than 5000 rows.
CREATE IMPRINTS INDEX Ind2 ON LetterNumber( Letter );
INSERT INTO LetterNumber ( Letter, Number ) SELECT 'G', 500 FROM (SELECT 1 FROM sys.tables LIMIT 100) t1 CROSS JOIN (SELECT 2 FROM sys.tables LIMIT 50 ) t2;
We can insert 5.000 rows ( 'G', 500 ) into our table by the statement on the left side. In that statement, we are using one system table as a dummy table.
NOTE: For this to happen we will temporarily make our table READ WRITE.
Now, that our table is READ ONLY and has more than 5.000 rows, we can apply Imprints index on the Letter column.
We will delete these indexes: DROP INDEX Ind1; DROP INDEX Ind2;
We will delete 5.000 rows from the table: ALTER TABLE LetterNumber SET READ WRITE; DELETE FROM LetterNumber WHERE Letter = 'G';
We insert 500 rows into the table. Due to hardware/software/power issues, our rows are only partially inserted. We do not know which rows are written and which are not. We are not sure whether rows written are correct or not.
For this problem we use a transaction. A transaction is a guarantee that all rows will be inserted correctly or none of them.
Let's assume another example. The employee was promoted. We need to change her job title in our database and increase her salary. For this we need two SQL statements. We want both statements to succeed or both to fail. Again, we can use a transaction.
START TRANSACTION; UPDATE JobsTable SET JobTitle = 'New job' WHERE Employee = 'Her'; UPDATE SalariesTable SET Salary = '+1000' WHERE Employee = 'Her'; COMMIT;
So, transaction is a set of statements that will be completed fully or will have no effect at all. This is how we preserve integrity and consistency of our database.
How Transactions Work in MonetDB?
MonetDB is using Optimistic concurrency control. That means that we do not use locks. Let's assume that we want to change the value 30 to value 60 in our database.
First, we will take a note that current value in the table is 30. →
Parallel to that, we will prepare our number 60 for the insertion. We want to prepare all in advanced, so that we can be really fast during insertion.
After that two things can happen:
If the current value is still 30, then we will lock the table. This lock will be really short and will be fine grained, meaning we will only lock 1 record, in our example. It's almost as if we're not locking anything.
After that, we will commit our change. Because we are really fast during commit (everything is already prepared), we reduce the chance that something bad will happen during that time. This guarantees us that the whole transaction will be completed successfully.
If the value in the table is changed by some other transaction, while we were preparing our insert, then we abort our mission. If this is the case, server will send an error message. After that, the user or application, can decide whether they want to retry their transaction or to do something else.
Error: Transaction failed: A conditional update failed
The purpose of this fail is to avoid conflicts between transactions.
Optimistic concurrency control is great for databases where we have high read, low write workloads. That means our database should not have a lot of conflicts where two transactions are changing the data. It is great for analytic and web databases, because of speed and scalability.
Autocommit
START TRANSACTION; --no need for this SELECT 2; COMMIT; --no need for this
By default, MonetDB uses a transaction around each individual statement. No need to manually start and end a transaction. SELECT 2; — <= this is already an individual transaction
Several Statements in One Transaction
To wrap several statements into one transaction, we have to use "START TRANSACTION" and "COMMIT" statements. Now, either both transactions will succeed or both will fail.
START TRANSACTION; SELECT 2; SELECT 3; COMMIT;
Why my Transaction Doesn't Work?
We will create one sample table. CREATE TABLE LetterNumber( Letter CHAR, Number INTEGER ); INSERT INTO LetterNumber( Letter, Number ) VALUES ( 'A', 1 ), ( 'B', 2 ), ( 'C', 3 );
If we start transaction, execute some statement and don't Commit, our change will still be visible. How is this possible without Commit?
START TRANSACTION; UPDATE LetterNumber SET Number = 4 WHERE Letter = 'A'; -- no Commit statement SELECT * FROM LetterNumber;
If we start a new session, only then we will see that our change is not really committed.
The change is only visible inside of our original session, so we can check the result of our statement without really committing it.
READ ONLY and READ WRITE
MonetDB documentation claims that we can make our transaction READ ONLY. This is not true, MonetDB doesn't support read only mode. It does support READ WRITE mode, but that mode is the default, so this subclause is useless.
START TRANSACTION READ ONLY;
START TRANSACTION READ WRITE; is the default, so it is unnecessary.
ISOLATION LEVEL
During the read of one column, someone else can change the data.
If we need 5 seconds to read that column, during that time someone can: – Insert a new row. – Update a row. – Delete a row.
The question is: Do we want to isolate our query from those changes? We can isolate our query completely or partially. Possible isolation levels are: 1) READ UNCOMMITTED 2) READ COMMITTED 3) REPEATABLE READ 4) SERIALIZABLE
Default Isolation Level
Default isolation level in MonetDB is READ COMMITTED (not sure 100%). For the current session isolation level can be changed with SET TRANSACTION. This statement will change isolation level, but it will also automatically start a new transaction.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT 2; COMMIT;
<= we change isolation level, but we also start a new transaction. <= this statement is inside the new transaction. <= we end this transaction.
SELECT 3;
<= this is another transaction (because of autocommit). This transaction is in the same session so it is using READ UNCOMMITTED isolation level. So, this isolation level is now valid for all the transactions in the rest of the session.
Because SET TRANSACTION automatically starts a new transaction, we can not use START TRANSACTION immediately after SET TRANSACTION.
ISOLATION anomalies
ISOLATION levels are used to save us from different anomalies ( dirty reads, lost updates … ) that can occur in a concurrent environment. A concurrent environment is an environment in which different users execute their queries at the same time.
Database will always fight to eliminate these anomalies. Even if we use the READ UNCOMMITTED isolation level, anomalies will probably not occur.
READ UNCOMMITTED will not isolate us at all. We will see everything other people are doing.
If we start a READ UNCOMMITTED transaction (1), we will see that that in the third row we have C3 (2).
In that moment, some other user will start a transaction (3), and he will change the third row to C5 (4).
If we read our table again, inside our transaction, we will see that change (5). We are not isolated at all. Even uncommitted changes by other users will be visible.
This anomaly is called "dirty reads". If we want to avoid this anomaly, we should use READ COMMITED isolation level.
READ COMMITTED will save us from dirty reads, but it won't save us from "lost updates".
Transaction A starts before Transaction B. Transaction A ends after Transaction B. Both transactions try to update the value 100 to something else. Since transaction A is committed last, the value written to the table will be 115. The value 75 is lost.
The solution is the REPEATABAL READ level. This level, if used by Transaction A, would lock the value 100 after reading it. This would force transaction B to fail, after step X = X – 25.
We don't know in advance which transaction would be the last, so both transactions should use REPEATABLE READ.
There is a similar anomaly as "dirty reads". That anomaly is "Non-Repeatable Reads". The difference is that the other person will not only make the change, they will also commit the change. If we can see the change before a commit, it's a "dirty reads" anomaly, if we only see it after a commit, it's a " Non-Repeatable Reads " anomaly. REPEATABLE READ will save us from this anomaly, too.
The strictest isolation level is "SERIALIZABLE". This level will save us from "phantom reads".
Phantom reads occur when one user start the transaction (1), and then he reads all the rows where gender is female (2). In the meantime, another user will start the transaction (3), he will delete one row where gender is female (4), and he will commit (5).
If the first user tries to read the same rows again (inside of the same transaction), he will see the deletion made by another user (6). The same thing would happen if the rows are inserted, in that case we would see "phantom rows".
How to assign isolation level in the individual transaction?
We saw how to assign default isolation level with SET TRANSACTION. It is similar for individual transaction:
We can control how our transaction will end. COMMIT keyword means that we want to execute our statement. ROLLBACK keyword means that we will fail our statement.
START TRANSACTION ISOLATION LEVEL READ COMMITTED; UPDATE LetterNumber SET Number = 8 WHERE Letter = 'C'; ROLLBACK; --will fail
START TRANSACTION ISOLATION LEVEL READ COMMITTED; UPDATE LetterNumber SET Number = 5 WHERE Letter = 'B'; COMMIT; --will execute
We can automatically open a new transaction after COMMIT or ROLLBACK. For that we use AND CHAIN subclause:
START TRANSACTION; SELECT 2; ROLLBACK AND CHAIN; --rollback transaction and start a new one SELECT 3; ROLLBACK; --rollback this second transaction
START TRANSACTION; SELECT 2; COMMIT AND CHAIN; --commit transaction and start a new one SELECT 3; ROLLBACK; --rollback this second transaction
What is a SAVEPOINT?
Savepoint is the same as the checkpoint in video games?
If your player dies before the checkpoint:
He will have to start again from the start:
If your player dies after the checkpoint:
Then he can continue the game from the checkpoint:
Idea of checkpoint is to save your progress, so you don't have to replay everything. In Transactions control, idea of Savepoint is to avoid repeating the whole transaction, if we can repeat only part of it.
How to use SAVEPOINT?
We'll make an example with two savepoints, then we will see what will happen if we apply ROLLBACK, ROLLBACK to savepoint1, or ROLLBACK to savepoint2.
START TRANSACTION; UPDATE LetterNumber SET Number = 2 WHERE Letter = 'A'; SAVEPOINT SP1; –the name of a savepoint UPDATE LetterNumber SET Number = 8 WHERE Letter = 'B'; SAVEPOINT SP2; –the name of a second savepoint UPDATE LetterNumber SET Number = 7 WHERE Letter = 'C';
Then, we can use one of these three commands: ROLLBACK; – Would rollback the whole transaction. ROLLBACK TO SAVEPOINT SP2; – Would rollback to the second savepoint. ROLLBACK TO SAVEPOINT SP1; – Would rollback to the first savepoint.
Starting status of our table:
Status before ROLLBACK:
If we use ROLLBACK:
If we go to SAVEPOINT SP2:
If we go to SAVEPOINT SP1:
After rollback to the savepoint, the transaction will still be open. It will only end after a COMMIT or ROLLBACK.
We can delete savepoints with RELEASE command. After we release a savepoint, we can not go back to that savepoint any more.
Before release:
RELEASE SAVEPOINT SP2;
RELEASE SAVEPOINT SP1;
RELEASE SAVEPOINT SP1; RELEASE SAVEPOINT SP2;
After releasing savepoint, the transaction is still ongoing; you can continue executing statements. Savepoints are independent. If we release savepoint1, savepoint2 will still be alive and valid.
Synonyms
These commands are synonymous, we can use them interchangeably.