MonetDB Archives - Page 4 of 10

0330 Loading Data Using SQL in MonetDB and Timing

MonetDB / By Bizkapish / February 2, 2025 February 8, 2025

Sample Table

We will start mclient with the timer turned on. The timer will measure the time to execute the query.

mclient --timer="clock" -u voc -d voc

Then we will create the sample table:

Inserting Data with INSERT INTO

We can insert data through some application to MonetDB, by sending INSERT INTO statements. This will work great if we don' t load a lot of rows. If we are USING a lot of consecutive INSERT statements then we can have performance issues.

INSERT INTO tblSample ( Letter, Number, Calendar ) VALUES ( 'D', 4, '2024-02-04' );
··· 1.000.000 X ···
INSERT INTO tblSample ( Letter, Number, Calendar ) VALUES ( 'ZZZZZZZ', 1000004, '2240-02-04' );

We know that in MonetDB, we can insert several rows with the one INSERT statement. This will not save us, because we shouldn't use more than 20 rows per one INSERT STATEMENT. If we use more than that, we will decrease performance.

INSERT INTO tblSample ( Letter, Number, Calendar ) VALUES ( 'D', 4, '2024-02-05' )
                                                        , ( 'E', 5, '2024-02-06' )
                                                        ··· 17 X ··· 
                                                        , ( 'F', 6, '2024-02-07' );    --20 rows at most

We can improve performance by following these 5 advice:

Disable autocommit. Autocommit will commit the data after each INSERT statement. If we can avoid that, we can speed up things.
We should prepare our statement. That means that our statement will be parsed and cached once. After that, each consecutive INSERT query will use the same statement, just with another parameters.
Use batch processing. Instead of sending a million INSERT statements, we can sent 100 batches of 10.000 INSERT statements. This will reduce communication latency between application and MonetDB, it will reduce memory usage and will minimize locking of a table.
We should disable optimizer. Optimizer can speed up more complex statements, but there is nothing that can be improved for the simple INSERT statement.
We can temporarily disable table constraints like primary key, foreign key or unique. We can restore those constraints after the import.

SQL benchmark

We'll insert one row, with one INSERT INTO statement. Then we'll see if we can noticeably increase the speed by following the tips above.

INSERT INTO tblSample ( Letter, Number, Calendar ) VALUES ( 'D', 4, '2024-02-05' );

First, we will disable our optimizer.
SET sys.optimizer = 'minimal_pipe';

In order to delete constraints, we have to found out their names. We can do that from the system tables.

We will not remove constraint "NOT NULL", because that constraint will not restrain performance.

START TRANSACTION; Now, we will start a transaction to disable autocommit.

After all this, we will again check the timing of our INSERT statement. We are now faster.
EXECUTE 0( 'E', 5, '2024-02-06');

The last thing is that we have to change
everything the way it was. COMMIT; -- finish transaction
DEALLOCATE PREPARE ALL; -- delete prepared statement
SET sys.optimizer = 'default_pipe'; -- turn on optimizer
ALTER TABLE tblSample ADD UNIQUE ( Letter ); -- bring back unique constraint
ALTER TABLE tblSample ADD PRIMARY KEY ( Number );-- bring back primary key constraint

Python Benchmark

We will now try INSERT with python script. In the blog post "Connect to MonetDB from Python" we have already saw how to use python with MonetDB. Bellow is the script we will use now. This time we will insert 10 rows of data.

import pymonetdb
import time

connection = pymonetdb.connect(username="voc", password="voc", hostname="localhost", database="voc")
cursor = connection.cursor()
insert_statements = [
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('F', 6, '2024-02-06');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('G', 7, '2024-02-07');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('H', 8, '2024-02-08');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('I', 9, '2024-02-09');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('J', 10, '2024-02-10');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('K', 11, '2024-02-11');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('L', 12, '2024-02-12');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('M', 13, '2024-02-13');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('N', 14, '2024-02-14');",
        "INSERT INTO tblSample (Letter, Number, Calendar) VALUES ('O', 15, '2024-02-15');",
    ]
overall_start_time = time.time()
for i, sql in enumerate(insert_statements, start=1):
       cursor.execute(sql)
connection.commit()
overall_end_time = time.time()
total_time = overall_end_time - overall_start_time
print(f"\n⏳ Total execution time for all inserts: {total_time:.6f} seconds")
cursor.close()
connection.close()

Total execution time is 0.008114 seconds.

Python Benchmark With Accelerations

We can speed up our Python script by using all of the advice mentioned in the text above. This is how our python procedure now looks like:

import pymonetdb
import time

connection = pymonetdb.connect(username="voc", password="voc", hostname="localhost", database="voc")
cursor = connection.cursor()
overall_start_time = time.time()
cursor.execute("SET sys.optimizer = 'default_pipe';")
cursor.execute("ALTER TABLE tblSample DROP CONSTRAINT tblsample_number_pkey;")
cursor.execute("ALTER TABLE tblSample DROP CONSTRAINT tblsample_letter_unique;")
sql = "INSERT INTO tblSample (Letter, Number, Calendar) VALUES (%s, %s, %s);"
data = [    ('P', 16, '2024-02-16'),
            ('Q', 17, '2024-02-17'),
            ('R', 18, '2024-02-18'),
            ('S', 19, '2024-02-19'),
            ('T', 20, '2024-02-20'),
            ('U', 21, '2024-02-21'),
            ('V', 22, '2024-02-22'),
            ('W', 23, '2024-02-23'),
            ('X', 24, '2024-02-24'),
            ('Y', 25, '2024-02-25')
        ]
overall_start_time = time.time()
cursor.executemany(sql, data)
connection.commit()
overall_end_time = time.time()
cursor.execute("DEALLOCATE PREPARE ALL;")
cursor.execute("SET sys.optimizer = 'default_pipe';")
cursor.execute("ALTER TABLE tblSample ADD PRIMARY KEY ( Number )")
cursor.execute("ALTER TABLE tblSample ADD UNIQUE ( Letter );")
connection.commit()
total_time = overall_end_time - overall_start_time
print(f"\n⏳ Total execution time for all inserts: {total_time:.6f} seconds")
cursor.close()
connection.close()

We don't have to explicitly start a new transaction, pymonetdb will do that automatically.

Now our timing is 0.004584 seconds.

Timing

How to Measure Time of the Query Execution

mclient --timer="clock" -u voc -d voc Timer can be "none" (default), "clock" or "performance".

Bellow we can results for these three modes of the timer switch.

When we use "performance", we get 4 results. "SQL" is time used for parsing. "Opt" is time used for optimizing statement. "Run" is time used for running the statement. "Clk" is total time used.

Query History

Data about executed statements is kept in the two tables. Those two tables can be returned with functions " sys.querylog_catalog()" and "sys.querylog_calls()". In order to work with those tables we have to login as administrator.

mclient -u monetdb -d voc Password is "monetdb".

Data saved in these two tables is persistent between sessions. We can use procedure "sys.querylog_empty()" to clear content from those two tables.

CALL sys.querylog_empty(); –procedures are started with the "call" keyword

In the current session we can start logging with the procedure "querylog_enable()".

CALL querylog_enable();

After that, I will run statement "SELECT * FROM voc.tblSample;" three times.

We can also read from the "querylog_calls()" table.

We can stop logging or queries before the end of the session with:

CALL querylog_disable();

Threshold

Each time we enable logging, our logging tables will become bigger and bigger. This can make the search for a query troublesome. In order to control amount od statements that will be logged, we can use "threshold" argument.

CALL querylog_enable(5000);

The threshold will limit the logged statements to only those whose execution time is longer than 5000 milliseconds. This allows us to perform profiling, to find the queries that are sucking up our resources the most.

0025 SystemD, Monetdbd, Mserver5: Clarification

MonetDB / By Bizkapish / January 21, 2025 January 25, 2025

Computer Process

When we start one program (e.g., Excel), we'll start one process.

Process occupies part of a CPU, part of memory and can only see files that are opened by that process. In that way, process is like a small virtual machine. Excel process is working inside of such small virtual machine and can only access workbooks that we have opened inside of that Excel (e.g., Book1.xlsx and Book2.xlsx). The purpose of a process is to isolate one program from everything else.

Processes, Daemons and Services

We can divide processes into 4 groups:

User process is a process with a friendly interface toward human users ( like Notepad.exe ).
Daemon process is a process with a friendly interface toward other programs ( "Windows Time" (w32time) service ).
Service is a daemon which provides interface to some essential functionality. For example, Apache server is a service because through it we get our web pages. In the background, Apache server will pull needed data from the MySQL database which is another daemon. In this setup, Apache server is a service daemon, and MySQL database is a normal daemon.
Some processes are in between. Microsoft Outlook is a user process dominantly, but it also has an VBA api, so it is partially a daemon.

We can see that the major difference between these processes is stemming from their indented usage. Usually, daemons will start when the OS starts and they will work in the background without users' interaction. On the other side, daemons can also be started manually and users can interact with them. We have already saw an example of such, manually started, daemon:

monetdbd start /home/fffovde/DBfarm1 monetdbd get all /home/fffovde/DBfarm1

Official Processes

In the "real world" other definitions are used. Something is a daemon or service only if it is officially registered with the operating system. Only processes registered in "Systemd" on Linux systems are called Daemons. In Windows, only processes registered in SCM (Service Control Manager) are called Services.

Systemd and SCM are operating system components used to automatically start official daemons and services in the correct order, control resource usage, access rights, log, and restart failed services. From an OS perspective, this is the only way to distinguish between ordinary and special processes. Since Windows and Linux use such definitions, most people will also use those definitions.

Two Ways to Manually Start Monetdbd Daemon

We will first check that Monetdbd daemon is not working. "pgrep" command is "process grep". We are searching the process by its name. Pgrep should return ID of a monetdbd process, but our process is not active. pgrep monetdbd

We can also start monetdbd daemon with the "start" command. This will fail because our daemon is already working.
monetdbd start /home/fffovde/DBfarm1

The explanation is that when we open monetdbd with systemctl, two things will happen:

Systemctl will start the daemon, but not "/home/fffovde/Dbfarm1", this server will not start. This is because systemctl will start the default server. The default server is on the location "/var/monetdb5/dbfarm".
The server "/etc/monetdb5/dbfarm" will occupy the port 50.000. That is why we are getting an error "cannot remove socket files".

Mserver5

When our database is opened, then the process Mserver5 is working. Currently our database is not opened. We can check that with: pgrep mserver5.

When we login to the voc database with mclient, at that moment the database will be opened if it is not already open. When our database is opened, process Mserver5 is also working. This is because Mserver5 process is OUR DATABASE. This process will perform all processing on request of clients for a database voc. Mclient console application is used to send our queries to this Mserver5 process.

Starting the Monetdbd Daemon When the Computer Starts up

We can not start "/home/fffovde/DBfarm1" server automatically when the computer boot up. We can only do that with the default server "/var/monetdb5/dbfarm". First we will stop "/home/fffovde/DBfarm1" server to release the port 50.000:

monetdbd stop /home/fffovde/DBfarm1

Then we will start "/var/monetdb5/dbfarm" server.
systemctl start monetdbd

Now that we have confirmed that our database is working, we will make our "/var/monetdb5/dbfarm" to a full fledged systemctl controled daemon. This will make our server to open automatically after each system boot.

systemctl enable monetdbd

Now we can exit everything and we can reboot our computer.

Avoid Entering Credentials Every Time

Running Two Servers at the Same Time

First, we will change the port of the "DBfarm1" server:
monetdbd set port=50001 /home/fffovde/DBfarm1

Cleaning up

Summary

We can open the server manually or automatically. To open it manually, we use "monetdbd start". For automatic startup we use "systemctl". We can only automatically open the server "/var/monetdb5/dbfarm". If we use two servers at the same time, they must have different port numbers.

In MonetDB, the databases are quite independent. In Microsoft SQL server things are different. SQL server can have many databases that are sharing many resources. They are sharing logins, tempdb, resource pools, memory settings, collation. Backups, replication, and monitoring tools (like SQL Agent jobs) are often configured at the server level.

Because of that it is correct to say that Mserver5 process is not just a database, it is a whole server. Monetdbd is not really a server. That daemon is only a managing tool for Mserver5 processes. Almost the only thing databases within the same dbfarm folder share is their port number. So, "voc" and "newDB" are monetdb servers. "/home/fffovde/DBfarm1" and "/var/monetdb5/dbfarm" are just folders for those servers.

Monetdb is console application used by a user to interact with montdbd daemon.

0320 Constraints and Altering of Tables in MonetDB

MonetDB / By Bizkapish / December 22, 2024 December 30, 2024

We will start this session as a admin. The only reason for this is because
I want to show that we can move table to some other schema. Password is "monetdb".
mclient -u monetdb -d voc

Sample Tables

Altering Our Tables

We can change the name of a column.
ALTER TABLE tblIncome RENAME COLUMN Turnover to Income;
SELECT * FROM tblIncome;

We can alter our column to accept nulls.
ALTER TABLE tblIncome ALTER COLUMN Income SET NULL; Beside READ ONLY and READ WRITE, we also have INSERT ONLY.
ALTER TABLE tblCost SET INSERT ONLY;

If we try to delete a row from INSERT ONLY table, that will be prohibited.

We can change default value for some column:
ALTER TABLE tblIncome ALTER COLUMN Income SET DEFAULT 200; We can delete DEFAULT value for Income column:
ALTER TABLE tblIncome ALTER COLUMN Income DROP DEFAULT;

We will delete our new column:
ALTER TABLE tblCost DROP COLUMN NewColumn; Both of our tables are created in VOC schema. We can move tblCost table to sys schema
, and we can move it back:
ALTER TABLE tblCost SET SCHEMA voc;
ALTER TABLE tblCost SET SCHEMA sys;
We can check with "\d" command that our table is transferred to some other schema.

Removing Constraints

If want to remove some constraint (primary key, unique constraint, foreign key), we have to know the name of that constraint. In our example we have placed PRIMARY KEY constraint on the tblIncome table, column ID, and UNIQUE constraint on the tblCost table, column ID . We will first read from system table sys.tables to find what are IDs of our tables:

Now that we know that our tables have IDs 13938 and 13946, we will search for our columns in the table sys.columns. Our columns have IDs 13932 and 13940.

Next step is to find names of our constraints. We will find them in the "dependency_columns_on_keys" table. We will filter this table by our tables and columns.

We can now create a query that will return the names of the constraints on the specific columns:

Now that we know our constraints names, we can delete them:
ALTER TABLE tblIncome DROP CONSTRAINT tblturnover_id_pkey; ALTER TABLE tblCost DROP CONSTRAINT tblcost_id_unique;

Constraint on Several columns

When we want to make constraint that encompass several columns, then we have to make a constraint on a table itself. In that case, our CREATE TABLE statement would be like:
CREATE TABLE Tab1 ( Col1 VARCHAR(10), Col2 VARCHAR(10), Col3 VARCHAR(10), PRIMARY KEY ( Col1, Col2 ), UNIQUE ( Col2, Col3 ));

We will now create PRIMARY KEY and UNIQUE constraints on our tables by using ALTER TABLE statement:
ALTER TABLE tblIncome ADD PRIMARY KEY ( ID, SubID );
ALTER TABLE tblCost ADD UNIQUE ( ID, SubID );

Naming of Constraints

We saw earlier that the name of constraint is created automatically by concatenating a) table name, b) column name, c) constraint type ( tblturnover_id_pkey ). If the constraint encompasses several columns, for example id and subid, then the name will be like tblturnover_id_subid_ pkey. If we want to give our constraint a name then we have to use this syntax in CREATE TABLE statement:

CREATE TABLE Tab2 ( Col1 VARCHAR(10), Col2 VARCHAR(10), Col3 VARCHAR(10), CONSTRAINT PKconstraint PRIMARY KEY ( Col1, Col2 )
                                                                        , CONSTRAINT Uconstraint UNIQUE ( Col2, Col3 ));

We will now give names to our constraints by using ALTER TABLE statement. We can not create another primary key constraint because table can only have one primary key constraint.

ALTER TABLE tblIncome ADD CONSTRAINT Primarius PRIMARY KEY ( ID, SubID );

We will first delete old primary key constraint, and then we will create a new one, that will have a custom name:

ALTER TABLE tblIncome DROP CONSTRAINT tblincome_id_subid_pkey; ALTER TABLE tblIncome ADD CONSTRAINT Primarius PRIMARY KEY ( ID, SubID );

On the other side, it is possible to have many UNIQUE constraints on one table.
ALTER TABLE tblCost ADD CONSTRAINT Uniquous UNIQUE ( ID, SubID );

Foreign Key Constraint

This "primary key" > "foreign key" relation is called foreign key constraint, because "primary key" column defines what can be entered into "foreign key" column.
When creating a table, we should use a syntax like this one to create foreign key constraint:

CREATE TABLE Tab3 ( Col1 VARCHAR(10), Col2 VARCHAR(10), Col3 VARCHAR(10), CONSTRAINT PkFk FOREIGN KEY ( Col1, Col2 )
                                                                          REFERENCES Tab2 ( Col1, Col2 ) );

We can omit the part ", CONSTRAINT PkFk", but in that case our constraint will get the default name " tab3_col1_col2_fkey".

We will now add foreign key constraints with ALTER TABLE statement. We will try to link "id" columns. This will fail. Foreign key constraint can only relate to column ( or columns ) that are declared as primary key or as unique columns.

ALTER TABLE tblCost ADD CONSTRAINT FromTblIncomeConstraint FOREIGN KEY ( id ) REFERENCES tblIncome ( id );

We will alter our statement to relate to primary key in the table tblIncome. Current primary key in the table tblIncome is composite key ( id, subid ). We will use that.

ALTER TABLE tblCost ADD CONSTRAINT FromTblIncomeConstraint FOREIGN KEY ( id, subid ) REFERENCES tblIncome ( id, subid );

RESTRICT / CASCADE when Deleting Constraints and Columns

We will try to delete ID column in tblCost table. `ALTER TABLE tblCost DROP COLUMN ID;`	We will try to delete primary key constraint in the tblIncome table. `ALTER TABLE tblIncome DROP CONSTRAINT Primarius;`

We will fail big. This is because the default mode is RESTRICT. By using CASCADE, we would be able to delete this column and this constraint.

START TRANSACTION; ALTER TABLE tblIncome DROP CONSTRAINT Primarius CASCADE; ROLLBACK; If we rollback our transaction, we want really delete the constraint. I don't want to delete it.

NULL Handling

This constraint will only have effect when we try to add new null:
INSERT INTO tblIncome ( id, subid, idwithnull, income )
VALUES ( 4, 4, null, 50 );

The default value for this subclause is NULLS DISTINCT. By default, we will consider all nulls to be distinct.

What is Not Supported in Foreign Key Constraint

For foreign key constraint, match can be only SIMPLE, on update can only be RESTRICT, on delete can only be RESTRICT. Other options are not supported so we will ignore them.

ALTER TABLE Tab3 ADD Constraint PkFk2 FOREIGN KEY ( Col1, Col2 ) REFERENCES Tab2 ( Col1, Col2 )
MATCH SIMPLE ON UPDATE RESTRICT ON DELETE RESTRICT;

Error Avoidance

As always, if we use IF NOT EXISTS or IF EXISTS, we will avoid errors. We will always get "operation successful".

CREATE TABLE IF NOT EXISTS employees ( id INT PRIMARY KEY, name VARCHAR(100), department VARCHAR(50), salary DECIMAL(10, 2) );
ALTER TABLE IF EXISTS employees ADD COLUMN hire_date DATE;

0310 Schemas in MonetDB

MonetDB / By Bizkapish / December 14, 2024 December 15, 2024

Difference Between Schema and Database

Separate server application (like SQL Server and MonetDB on the image) are totally independent. Databases are almost like that:

This is what differentiate databases:
– Each database is a separate process.
– Each database has its own system tables and storage structures.
– For each database we have to make a separate connection.
– We can not query tables that belong to different databases.

This is what connects databases:
– Some settings can be defined on the server application level (CPU, memory, disk quotas, cache size).
– You access databases through the same Hostname/IP Address.

Schemas are not like databases. Schemas are just a way to organize our objects. We benefit from schemas because:

Schemas allow us to organize our objects.
Schemas allow us to differently set security and access control.
Schemas allow us to have objects that have the same name, but they have to be in the separate schemas
( MySchema.TableName, YourSchema.TableName ).
We can query tables from different schemas with one SELECT statement, but only if we have access rights.

Schemas in MonetDB

Creation of the New Schema

This SELECT statement will fail. Our current schema is VOC. There is no table "NewTable" in VOC2 schema.
SELECT * FROM VOC2.NewTable;

We will write something into our two tables so we can make a distinction between them:

INSERT INTO NewTable ( Text ) VALUES ( 'VOC');

INSERT INTO VOC2.NewTable ( Text ) VALUES ( 'VOC2');

Ownership of a Schema

We can find owner of the schema by using an information schema view. Administrators are the only ones who can create schemas. That is why we always have "monetdb" in the schema_owner column.

SELECT * FROM information_schema.schemata;

Authorization of a Schema

Notice above, that user voc has the default schema 7110*, and role monetdb has the default schema 2000*. When we log in as such users, this will be our initial current schema.

If we go back to one of the first posts in this series, we will find that we gave authorization to the user " VOC " during creation of VOC schema:
CREATE SCHEMA "VOC" AUTHORIZATION "VOC"; During creation of the schema VOC2, we haven't authorized anyone. In that case authority will belong to "MonetDB Admin". We can see that in images above. Authorization for VOC2 schema belogs to default role 3, and that is monetdb group of adiminstrators.

After the keyword AUTHORIZATION, we can have either a role or a user name. Only one username or role can be authorized. If we want for several users to control some schema, then we have to give the authorization to the role to which those users belong. After creation of a schema, it is not possible to change its authorization. Because of this, it is always better to give authorization to a role, than to a user. Afterwards we can just add or remove users to that role.

Easy Creation of a Schema for Some Role/User

Renaming of a Schema

ALTER SCHEMA voc2 RENAME TO voc3; We can only rename schema if there are no objects that depends on the name of that schema.

We will now create a VIEW, an object that is dependent on schema voc3.
CREATE VIEW aaa AS SELECT * FROM voc3.NewTable; Because of that view, it is no more possible to rename this schema.
ALTER SCHEMA voc3 RENAME TO voc2;

Deleting a Schema

We will first change current schema to VOC, because it is not possible to delete current schema:
SET SCHEMA VOC;

DROP SCHEMA voc3; --We can not delete schema because there is a view that depends on that schema.

We will delete our View.
DROP VIEW VOC3.aaa; But it is still impossible to delete schema, because there is the table "New Table" in it.

In MonetDB, we can not easily delete the schema because RESTRICT is the default mode:
DROP SCHEMA voc3 RESTRICT;
We have to supply the keyword CASCADE to easily delete schema. This means that schema will be deleted together with all of the dependent objects (tables and views):
DROP SCHEMA voc3 CASCADE;

Avoiding Errors

CREATE SCHEMA IF NOT EXISTS SchemaName; DROP SCHEMA IF EXISTS SchemaName; By using "IF NOT EXISTS" and "IF EXISTS", we can avoid getting error messages.

0300 Indexes and Views in MonetDB

MonetDB / By Bizkapish / November 24, 2024 November 27, 2024

Sample Table

Views in MonetDB

Views are named and saved queries. First, we make a SELECT query, we give it a name, then we save it. Whenever we need the logic encapsulated in that query, we can get it by calling the name of a saved view. The name of a view can be now used at any place where we can use a name of a table. Views are often called "virtual tables".

Advantages of views:
– Reuse logic. This also improves consistency.
– Make logic modular.
– Can be used to control what users can see and what can not see.
– Views can be used as intermediary between database and application. This will reduce interdependence.

Limitations of views:
– Changes to underlaying base tables can invalidate views.
– Proliferation of views that are mutually referenced can lead to complex structures and increased interdependency.

Information_schemas.Views

Each database has its own system tables where it stores database metadata. These tables are different among different database servers. In order to improve standardization, SQL committee introduced standardized set of views, called "information_schemas". These views allow us to query different databases using the same queries, and to get the same metadata. These queries below will work in MonetDB, Postgres and many other databases.

SELECT schema_name, schema_owner FROM information_schema.schemata; SELECT table_name, table_type FROM information_schema.tables;

We are now interested in the view "Information_schemas.views". We can get informations about our view with the query:

There are many more columns in this "information_schema.views" view, but we will talk about them another time.

Droping the View

It is possible to create a view based on a view. Now we have a database object that is dependent on the View1. CREATE VIEW View2 ( Letter3 ) AS
SELECT Letter2 FROM View1;

We can solve this in two ways.
We can first delete View2 and then View1. → DROP VIEW View2; DROP VIEW View1; Other solution is to use CASCADE subclause.
DROP VIEW View1 CASCADE;

CASCADE keyword will delete View1 and all of the objects that are dependent on the view View1 (that would be View2).

If we now try to delete View1, we would get an error. `DROP VIEW View1;`
We can avoid this error with IF EXISTS subclause. With this subclause we'll always get "operation successful". `DROP VIEW IF EXISTS View1;`

What are Indexes?

Indexes will speed up data reading. Indexes must be updated when data is modified, which can slow down INSERT, UPDATE, and DELETE operations. We read data much more often than we write it, so indexes are a good approach to make our database more performant.

How to Create Indexes in MonetDB?

Great thing is that we do not have to. MonetDB will create optimal indexes for us. The user is still free to create indexes manually but those indexes will only be considered as suggestions by the MonetDB. MonetDB can freely neglect user's indexes, if it finds a better approach.

CREATE INDEX Ind1 ON voc.LetterNumber ( Letter );

Voc.LetterNumber is SchemaName.TableName. The index name must be unique within the schema to which the table belongs.

MonetDB has two special indexes: Imprints and Ordered index. These indexes are experimental and we should avoid using them.

Special Index: Ordered index

There are some limitations on this index:
– We can use only one column for each index.
– After UPDATE, DELETE or INSERT, on the table, this index will become inactive.

The problem is that these limitations are only according to MonetDB documentation. MonetDB will NOT enforce these limitations. Below we can see an Ordered index which is using several columns, which shouldn't be allowed. This is probably the result of the fact that this index is experimental and not fully implemented.

CREATE ORDERED INDEX Ind1 ON voc.LetterNumber( Letter, Number );

When we create this index, it will appear in the sys.idxs table, but we can not be sure whether it is active or not. It seems that the best bet to make this index active is to:

Make the table READ ONLY.

ALTER TABLE LetterNumber SET READ ONLY; ALTER TABLE LetterNumber SET READ WRITE;

Create Ordered index on only one column.
Don't make any further changes to that table.

Creation of this index is expensive, so we should create it ad hoc on the READ ONLY tables.

I will delete the index we have created. DROP INDEX Ind1;

Special Index: Imprints Index

Imprints index is similar to Ordered index. It is experimental and not fully implemented. Some of the limitations I will talk about are not enforced by MonetDB, and they are just theoretical limitations:

Imprints index can only be implemented on numerical and string columns.
After UPDATE, DELETE or INSERT, on the table, this index will become inactive.
One Imprints index can only be implemented on one column.

This index will most likely work if we apply the same steps as for the Ordered index ( READ ONLY table, one column in index, don't change nothing else ).

The idea of the Imprints index on numerical column is to divide that column into segments, and then to store some metadata for each segment. For example, we can store minimum and maximum value for the values in one segment. If someone applies a filter "VALUE > 30", we will be able to avoid searching through segments where the maximum value is 30 or less. This will speed up our filters.

CREATE IMPRINTS INDEX Ind1 ON LetterNumber( Number );

The idea of the Imprints index on a string column is to make LIKE filters faster. This index would make possible to prefilter our column by using fast but not totally accurate algorithm. After that, we can apply correct algorithm on the already reduced set of data.

In our example, Imprints index on a string column will fail, MonetDB will enforce the READ ONLY requirement this time. CREATE IMPRINTS INDEX Ind1 ON LetterNumber( Letter);

After applying "ALTER TABLE LetterNumber SET READ ONLY;", our statement will still fail, because it doesn't work on the columns that have less than 5000 rows.

CREATE IMPRINTS INDEX Ind2 ON LetterNumber( Letter );

INSERT INTO LetterNumber ( Letter, Number ) SELECT 'G', 500
FROM (SELECT 1 FROM sys.tables LIMIT 100) t1
CROSS JOIN (SELECT 2 FROM sys.tables LIMIT 50 ) t2; We can insert 5.000 rows ( 'G', 500 ) into our table by the statement on the left side. In that statement, we are using one system table as a dummy table.

NOTE: For this to happen we will temporarily make our table READ WRITE.

Now, that our table is READ ONLY and has more than 5.000 rows, we can apply Imprints index on the Letter column.

We will delete these indexes:
DROP INDEX Ind1; DROP INDEX Ind2; We will delete 5.000 rows from the table:
ALTER TABLE LetterNumber SET READ WRITE; DELETE FROM LetterNumber WHERE Letter = 'G';