0140 MonetDB – SET operators

SET OPERATORS

Let's say we have a purple and a green data set. The data set is the result of a query. Using the SET operators, we can treat the rows in those two data sets as mathematical sets.

With joins, we concatenate tables horizontally. If we want to fuse tables vertically then we would use a union. For union to work we need to meet three conditions:
– Two data sets must have the same number of rows.
– Order of columns in one data set must be the same as order of the columns in another data set.
– Columns at the same position must have similar data type.
 
There are two kinds of unions, we have "UNION" and we have "UNION ALL". "UNION ALL" would just glue two data sets together. If the first data set has X rows, and the second one Y rows, final data set would have X + Y rows.
 
"UNION" will return the same thing, but without duplicates.
 
"INTERSECT" means that we are looking for the rows that belong both to the first and the second data set. "INTERSECT" will return only distinct rows.

"EXCEPT" returns only rows from the first dataset, that do not have equivalents in the second data set. Again, only distinct rows will be returned.

Sample tables

We will create two sample tables. Notice that columns in these tables are using compatible data types ( CHAR vs VARCHAR(1), TINYINT vs SMALLINT ).

1) CREATE TABLE aboveTable ( Letter CHAR, Number TINYINT );
2) INSERT INTO aboveTable ( Letter, Number )
  VALUES ( 'A', 1 ), ( 'A', 1 ),
         ( 'B', 2 ), ( 'B', 2 );
1) CREATE TABLE belowTable ( Letter VARCHAR(1), Number SMALLINT );
2) INSERT INTO belowTable ( Letter, Number )
  VALUES ( 'B', 2 ), ( 'B', 2 ),
         ( 'C', 3 ), ( 'C', 3 );

UNION ALL

Our tables are meeting conditions for a UNION ALL.

SELECT * FROM aboveTable UNION ALL SELECT * FROM  belowTable;

UNION

UNION will remove duplicate rows.

SELECT * FROM aboveTable UNION SELECT * FROM belowTable;

INTERSECT

Intersect will give us overlap between two tables.

SELECT * FROM aboveTable INTERSECT SELECT * FROM belowTable;

EXCEPT

Let's see rows that exist in the aboveTable, but not in the belowTable.
 
SELECT * FROM aboveTable EXCEPT SELECT * FROM belowTable;

INTERSECT and EXCEPT with duplicates

If we can use ALL with UNION, is it possible to use it with INTERSECT and EXCEPT? It is possible. This way, all of the duplicates will remain.

SELECT * FROM aboveTable INTERSECT ALL SELECT * FROM  belowTable;SELECT * FROM aboveTable EXCEPT ALL SELECT * FROM  belowTable;

ORDER BY and SET operators

When we use ORDER BY, it must be applied on the last query.
 
SELECT * FROM aboveTable
UNION
SELECT * FROM belowTable ORDER BY Number DESC;

Column names and SET operations

The above SELECT is determining names of columns. Names of columns in above and below SELECT don't have to be the same.

SELECT Letter as L, Number as N
FROM aboveTable
UNION
SELECT * FROM belowTable ORDER BY N DESC;

Connecting More Than 2 Tables with Set Operators

How EXCEPT works

SELECT * FROM aboveTable
UNION ALL
SELECT * FROM belowTable
EXCEPT
SELECT * FROM belowTable;
Chained operations are conducted
in sequence manner. First we make
an UNION, and then we apply EXCEPT.
 
Let's see below steps that are leading to our results.
1) In the first step, UNION ALL will unite our two tables. Next we have to do EXCEPT vs belowTable.2) In the second step, all duplicates from both tables will be removed. Then we would do EXCEPT operation.
 
3) The only row that exist in the first table, and not in the second table is a row "A-1". That is our result.

How EXCEPT ALL works

SELECT * FROM aboveTable
UNION ALL
SELECT * FROM belowTable
EXCEPT ALL
SELECT * FROM belowTable;
When we use EXCEPT ALL, we would
get 4 rows as a result. Let's see below
steps that have leaded to this outcome.
1) In the first step, UNION ALL will unite our two tables. Next, we have to do EXCEPT ALL vs belowTable.2) This time there is no DISTINCT.

Rows from the two tables will destroy each other like matter and antimatter. What remains in the left table will be our final result.

EXCEPT ALL with DISTINCT

Statement below will return table on the right image. Now we have 6 rows in the result data set.

SELECT * FROM aboveTable
UNION ALL
SELECT * FROM belowTable
EXCEPT ALL
SELECT DISTINCT * FROM belowTable;
This time only two rows from the UNION ALL tables will be excluded so we will be left with 6 remaining rows.

UNION AND INTERSECT

Here is one example with INTERSECT. Again, first two data sets will be connected with UNION ALL. Their result will be than intersected with the last data set.

SELECT * FROM aboveTable
UNION ALL
SELECT * FROM belowTable
INTERSECT ALL
SELECT DISTINCT * FROM belowTable;
1) We will start two intermediate data sets. First one is the result of UNION and it will have 8 rows. The other one is created with DISTINCT from the belowTable, and it has 2 rows.
2) Intersection between these two data sets is this:

Two Conclusions

1) SET operators don't have priority among them. They are applied from the top to the bottom. This is different than in some other databases where INTERSECTION has the top priority. This would be the result if we enforce priority of an INTERSECT operator.

2) SET operators without ALL specifiers will first remove duplicates from their operands. SET operators with ALL specifiers will leave their operands untouched. This is true not only for EXCEPT (like in previous examples), but also for UNION and INTERSECT. The message here is that removal of the duplicates is not conducted on the result, but on the operands, and after that SET operators will apply their logic.

Corresponding

THIS IS NOT WORKING IN MONETDB, although documentation claims that it works.

When working with SET operators we need to be careful to order our columns correctly. Their position has to match between tables. This makes things harder. We will face errors many times.
CORRESPONDING clause can fix this problem. CORRESPONDING clause will correctly pair columns so that UNION is successful. Unfortunately, statement on the right side will work in some other databases, but not in the MonetDB, so I will just stop explaining it.

You can read more about this clause on this blog:
https://blog.jooq.org/a-rarely-seen-but-useful-sql-feature-corresponding/
SELECT Number, Letter FROM aboveTable
UNION ALL CORRESPONDING
SELECT Letter, Number FROM belowTable;

0130. MonetDB – SELECT from multiple tables

Sample tables

We will create these two tables in the MonetDB database:

We can create these two tables using these statements. The first statement will create the table. The second statement will populate that table with data. These are the same tables we have used in the previous blog post, so maybe you already have them in your database.

1) CREATE TABLE Employees ( EmployeeID INT, Employee VARCHAR(10 ) );
2) INSERT INTO Employees ( EmployeeID, Employee ) VALUES ( 1, 'Hugo' ), ( 2, 'Gabriel' ), ( 3, 'Aurora' ), ( 4, 'Adam' );
1) CREATE TABLE Sales ( SalesID INT, DateCol DATE,  EmployeeID INT, Product CHAR(1), Qty INTEGER, Price INTEGER );
2) INSERT INTO Sales ( SalesID, DateCol, EmployeeID, Product, Qty, Price ) VALUES
     ( 1, '2024-03-05', 2, 'X', 10, 4)
   , ( 2, '2024-03-05', 3, 'X', 20, 5)
   , ( 3, '2024-03-05', 2, 'Y', 30, 6)
   , ( 4, '2024-03-06', 3, 'Y', 40, 7)
   , ( 5, '2024-03-06', 2, 'Z', 50, 8)
   , ( 6, '2024-03-06', 3, 'Z', 60, 9)

Crossjoin

A cross join is when we combine every row from one table with every row from another table. If one table has X rows and the other Y rows, then the result table will have X * Y rows.

SELECT * FROM Employees CROSS JOIN Sales;
 
In our example, the Employees table has 4 rows, the Sales table has 6 rows, so the resulting table has 24 rows. In the picture we can see that each row of the Employee table is combined with each row from the Sales table.

"EmployeeID" column will appear twice if use SELECT with a star.
Statement with a specific column "EmployeeID" will become ambiguous.
SELECT EmployeeID FROM Employees CROSS JOIN Sales;

We'll have to write the full column name to avoid that ambiguity.
SELECT Employees.EmployeeID FROM Employees CROSS JOIN Sales;

Diagram

Diagram shows two tables that we want to join. There are three basic ways how we can join tables, INNER join, LEFT join, and FULL OUTER join.

INNER JOIN means only matching rows will remain after join. In LEFT JOIN, we will not exclude any row from the left table and such rows will be matched with null values in the right table. A FULL OUTER JOIN means that no one will be left out. All orphan rows will be padded using nulls.

INNER JOIN

Let's filter the CROSSJOIN table so that only the rows where the Emploiees.EmploieeID column is equal to the Sales.EmploieeID column will remain.

SELECT * FROM Employees CROSS JOIN Sales
WHERE Employees.EmployeeID = Sales.EmployeeID;

 
This is the definition of an INNER JOIN. An INNER JOIN is a CROSS JOIN where only columns that meet certain conditions are saved in the result.
For INNER JOIN we have this specialized syntax. It makes it clearer what we are trying to achieve.
 
SELECT * FROM Employees INNER JOIN Sales
ON Employees.EmployeeID = Sales.EmployeeID;
The point of INNER JOIN is to only look for successful employees. Gabriel and Aurora each made three sales and we want to see those results. Hugo and Adam did not make any sales so they will be missing from the score.
 
We combine each row from the Employees table with only the matching rows from the Sales table.

LEFT JOIN

SELECT * FROM Employees LEFT JOIN Sales
ON Employees.EmployeeID = Sales.EmployeeID;

 
LEFT OUTER JOIN is when we want to expose employees without sales. LEFT OUTER JOIN will show us all rows from the Employees table. If any rows from the Employees table do not have matching rows in the Sales table, then they will be accompanied by null values.
 
Nulls will help us easily see which rows from the Employees table do not have corresponding rows in the Sales table.

FULL OUTER JOIN

I will add another Sale to the Sales table. We'll pretend that the former employee made that sale.
INSERT INTO SALES ( SalesID, DateCol, EmployeeID, Product, Qty, Price ) VALUES ( 0, '2024-03-04', 0, 'Q', 15, 3 );

That will help us explain the FULL OUTER JOIN.
A FULL OUTER JOIN will explain all our sales and show employees without sales. Now we can see the complete picture. We can see the source of our sales and we can see who is not making sales.

SELECT * FROM Employees FULL OUTER JOIN Sales
ON Employees.EmployeeID = Sales.EmployeeID;

 
In a FULL OUTER JOIN, matching rows will be cross-joined, but non-matching rows, from the both tables, will be completed with nulls.

Matching condition and aliases

The matching conditions can be more complex than in our examples. It can consist of several columns.SELECT Tab1
INNER JOIN Tab2
ON Tab1.Col1 = Tab2.Col1
   AND (Tab1.Col2 = 3
       OR Tab1.Col2 > Tab2.Col3)

We can notice that an overly complex SELECT statement can lead to hard-to-read code. This is a case where we can use aliases wisely. In practice, we usually use short aliases, consisting of only one letter, such as "a" and "b".

SELECT Tab1 a
INNER JOIN Tab2 b ON a.Col1 = b.Col1
AND (a.Col2 = 3
     OR a.Col2 > b.Col3)
The left table would be given the alias a and the right table would be given the alias b.
This makes things much more readable.
It is also possible to create aliases for the columns.

SELECT *
FROM Employees a( EmpNumber, Name )
INNER JOIN Sales b
ON a.EmpNumber = b.EmployeeID;

Join with more than two tables

We will add one more table to our database in order to explain JOIN with more than two tables.

1) CREATE TABLE Calendar ( DateCol DATE, MonthCol INTEGER, YearCol INTEGER );
2) INSERT INTO Calendar ( DateCol, MonthCol, YearCol )
   VALUES ( '2024-03-05', 3, 2024 ), ( '2024-03-06' , 3, 2024 );

The tables are connected in a chain manner. We can join as many tables as we want.

SELECT * FROM Employees
FULL OUTER JOIN Sales
   ON Employees.EmployeeID = Sales.EmployeeID

FULL OUTER JOIN Calendar
   ON Sales.DateCol = Calendar.DateCol;

Special syntaxes

These special syntaxes are not often used. These syntaxes are based on the fact that tables are usually joined by columns with the same name and data type.

NATURAL JOIN

A NATURAL JOIN is an INNER JOIN between two tables. The condition is based on equality between columns that have the same name and the same data type. Let's say we have an INNER JOIN between the Emploees and Sales tables.

 
INNER JOIN
 
SELECT * FROM Employees INNER JOIN Sales
ON Employees.EmployeeID = Sales.EmployeeID;
An easier way to achieve this is to use NATURAL JOIN. A NATURAL JOIN will give us the same result based on the fact that the EmployeeID column exists in both tables and is of the same data type. There are no other columns with such quality.
 
SELECT * FROM Employees NATURAL JOIN Sales;

USING Clause

The USING clause also assumes that we have some columns in both tables that have the same name and are of the same type. With the USING clause, we can specify any type of join (INNER, LEFT, LEFT OUTER). Let's make a left join between the tables Employees and Sales.

 

 
LEFT JOIN
 
SELECT * FROM Employees LEFT JOIN Sales
ON Employees.EmployeeID = Sales.EmployeeID;
An easier way to achieve this is to use the USING clause. In the USING clause, we only need to specify the columns that exist in both tables and have the same name and data type. The JOIN condition will be based on the equality of those columns.
 
SELECT * FROM Employees LEFT JOIN Sales
USING ( EmployeeID );

0120. SELECT Statement in MonetDB

Sample tables

We will create these two tables in the MonetDB database:

We can create these two tables using these statements. The first statement will create the table. The second statement will populate that table with data.

1) CREATE TABLE Employees ( EmployeeID INT, Employee VARCHAR(10 ) );
2) INSERT INTO Employees ( EmployeeID, Employee ) VALUES ( 1, 'Hugo' ), ( 2, 'Gabriel' ), ( 3, 'Aurora' ), ( 4, 'Adam' );
1) CREATE TABLE Sales ( SalesID INT, DateCol DATE,  EmployeeID INT, Product CHAR(1), Qty INTEGER, Price INTEGER );
2) INSERT INTO Sales ( SalesID, DateCol, EmployeeID, Product, Qty, Price ) VALUES
     ( 1, '2024-03-05', 2, 'X', 10, 4)
   , ( 2, '2024-03-05', 3, 'X', 20, 5)
   , ( 3, '2024-03-05', 2, 'Y', 30, 6)
   , ( 4, '2024-03-06', 3, 'Y', 40, 7)
   , ( 5, '2024-03-06', 2, 'Z', 50, 8)
   , ( 6, '2024-03-06', 3, 'Z', 60, 9)

Simple SELECT statement

The simplest SELECT statement will read the entire table. The star "*" is a symbol that represents all columns. This statement will read all columns and all rows from the table.

SELECT * FROM Employees;
If we don't need all the columns, we have to specify them. After the SELECT keyword, we need to type the list of columns we want to read. Only those columns will be fetched.

SELECT Product, Qty FROM Sales;
If we want to rename columns, we can use "AS Alias" after the column name.

SELECT EmployeeID AS ID, Employee AS Name FROM Employees;
We don't need a table for our SELECT statement. We can use expressions to create values. If we don't provide aliases for these expressions, their columns will have generic names like "%2, %3, %4".

SELECT 3+3, CURRENT_DATE;
We can combine columns and expressions.

SELECT Employee, CURRENT_DATE as Today FROM Employees;
An expression can reference values from columns. In our example, the Multiplication column is calculated by multiplying the value from the EmploeeID column by 11. If we put double quotes around the alias "Multiplication", then this alias will preserve its case in the result. Otherwise, it will be written in lowercase letters.

SELECT EmployeeID, EmployeeID * 11 AS "Multiplication" FROM Employees;

Reminder of the Identifiers

If we create a table with double quotes, like CREATE TABLE Tab1 ( "Col1" CHAR ), then the header of this table will be like this:

The column name will be case sensitive. This will force us to write the SELECT statement with double quotes as well. If we omitted the double quotes, our SELECT statement would not work.

If we want our column name to have any non-standard characters or spaces, then we would always have to use double quotes.

Filtering and Ordering Rows

After the FROM clause we can use WHERE. With WHERE we can define a condition that will filter only matching rows. Within that expression we can reference any column. Expressions can be much more complex than this.

SELECT * FROM Employees WHERE EmployeeID >= 3;
The server will read rows from the table in a way that achieves the best performance. We don't know in advance the order in which the server will read the records, so we should make the server sort the result using the ORDER statement. The ORDER command will have a list of columns by which the rows will be sorted. We can sort using ASC or DESC order, but ASC is the default. The table will be sorted first by EmploeeID and then by Qty, so the order of the columns in the ORDER statement is important.
SELECT * FROM Sales ORDER BY EmployeeID DESC, Qty;
If we add DISTINCT to our statement, we will remove duplicate rows.

SELECT Product FROM Sales; 6 rows

SELECT DISTINCT Product FROM Sales; 3 rows
We can limit the number of rows we want the server to return. For that we use the keywords LIMIT and OFFSET. We will also use ORDER BY in this example to better understand what MonetDB did. MonetDB will skip one row and then read the next two rows.

SELECT * FROM Employees ORDER BY EmployeeID LIMIT 2 OFFSET 1;
Using only LIMIT, we can read a limited number of lines from the beginning. Using only OFFSET, we can read everything except the lines at the beginning.

Grouping our Data

If we want to remove duplicates from our columns, we can use GROUP BY. In this simplest case, we must mention all columns after both SELECT and GROUP BY. This result could be made much easier with the DISTINCT keyword. In practice we use GROUP BY to achieve something different and this will become clearer in the example below.

SELECT DateCol, EmployeeID FROM SALES GROUP BY DateCol, EmployeeID;
GROUP BY will remove rows for non-aggregated columns by removing duplicates. For columns like Quantity and Value, we can consolidate rows by aggregating the values.
SELECT DateCol
     , EmployeeID
     , SUM( Qty ) AS Qty
     , SUM( Price * Qty ) AS Value 
FROM Sales GROUP BY DateCol, EmployeeID;
If one of the columns we are grouping on is an expression, then that expression will simply need to be repeated in the GROUP BY statement.
SELECT DateCol
     , 2 * 3 AS Expression
     , SUM( Qty ) AS Qty
FROM Sales GROUP BY DateCol, 2 * 3;

Using VALUES Clause

We can create a SELECT statement that will return our hard-coded values. It is mandatory to give aliases to the result columns.

SELECT * FROM
( VALUES (0,'cero'), (1,'uno'), (2,'dos'), (3,'tres') )
AS ( Number, Word );

 
Aliases have to be inside brackets.
We can also give such a table an alias.
SELECT * FROM ( VALUES (0,'cero'), (1, 'uno' ) ) AS TableName(Number, Word);

This alias will come in handy in some more complex statements.

0110. CREATE TABLE in MonetDB

Simple start

All tables in the same schema must have unique names. Tables cannot have the same name as a view. When we create a new table, we can omit the schema name. In this case, the table will be created in the current schema.

This is the simplest CREATE TABLE statement. We need to specify table name, column name and column data type.
CREATE TABLE Tab1 ( Col1 CLOB );
If we try to create a table with the same name again, we will get an error.
CREATE TABLE: name 'tab1' already in use
We can avoid that error with clause "IF NOT EXISTS". A new table will only be created if no other table with that name exists. If a table with that name already exists, nothing will happen, but we won't get an error.
CREATE TABLE IF NOT EXISTS Tab1 ( Col1 CLOB );
If we want to create a table in a non-current schema, we must use the fully qualified table name. Of course, we have to have enough privileges for that.
CREATE TABLE sys.Tab1 ( Col1 CLOB );

After the data type we can include some options that better describe our column. Those options are DEFAULT, NOT NULL, PRIMARY KEY, UNIQUE.

CREATE TABLE Tab2 (Col1 CLOB DEFAULT 'zzz' NOT NULL PRIMARY KEY UNIQUE);
In the system table sys.keys, we can now find our constraints for the primary key and for the unique constraint. Both of these constraints belong to table Tab2, which has ID 8271.
SELECT * FROM sys.keys WHERE Table_id = 8271;
There is also a system table sys.columns where we can find columns named "col1". The column in the last row belongs to table Tab2, because table_id = 8271.
SELECT * FROM sys.columns WHERE name = 'col1';
If we read from the table sys.tables, we can search for our tables by using their ID-s.
SELECT * FROM sys.tables WHERE id = 8252 or id = 8271;
It is possible to insert nothing into our table. Such a statement will work.
When we try to read from our table, we will see that the DEFAULT value has been written into it.
If we try to write a NULL value to our table, such an act will fail, because our column is defined as a NOT NULL column.
INSERT INTO Tab2 VALUES (null);
If we try to write the default value again to our table, it will fail. We cannot have two rows with the same value in column Col1, due to primary key constraints.

In the real world, we would never mix DEFAULT and PRIMARY KEY constraints. PRIMARY KEY means that each row should be unique. That's the opposite of what DEFAULT is trying to do. PRIMARY KEY also means that our column does not accept nulls, so there is no need for a formal NOT NULL constraint. UNIQUE is also redundant as PRIMARY KEY will not allow duplicates anyway. In the real world, we would never use all the constraints on the same column.

Creating a Table Based on Some Other Table

Using LIKE operator

Previously, we created a table Tab2 with a column Col1 that has many constraints. Now we want to create a new table that will be a copy of Tab2, but will have a few more columns. We can do it in one step. This statement below will create all the columns found in the Tab2 table and place them in the Tab3 table. Tab3 will also have another column of type INTEGER.

CREATE TABLE Tab3 ( LIKE Tab2, Col2 INTEGER );
We can see on the image, Tab3 inherited "Col1" from the table Tab2. We also added one more column "Col2" in table Tab3.
Constraints on columns in Tab2 will not be inherited. If we read from system table sys.columns, we will notice that table with ID 8280 (Tab3), doesn't have the same constraints as the table 8271 (Tab2). All of the constraints are lost.

Using AS SELECT

By using AS SELECT statement we would create a table based on some SELECT query. We can type:

CREATE TABLE Tab5 ( Zcolumn, Today )
AS ( SELECT Col1, current_date  FROM Tab2 );
The new table will not inherit the constraints from the old column. We can see that we don't have the same restrictions on the 'zcolumn' column as we did on the 'col1' column.
We don't have to provide aliases. We can use original column names.
CREATE TABLE Tab6
AS ( SELECT Col1, current_date AS Today  FROM Tab2 );
If add "WITH NO DATA" clause, then we would get the columns, but without data.
CREATE TABLE Tab7
AS ( SELECT Col1, current_date AS Today  FROM Tab2 )
WITH NO DATA;

Table Constraints

We can place PRIMARY KEY constraint on one column. It won't help us if our table has a composite primary key. If this is the case, we need to place constraints on the table itself. We can write the statement like this:

CREATE TABLE Tab4 ( Col1 CLOB, Col2 CLOB, Col3 INT, Col4 INT,  PRIMARY KEY ( Col1, Col2 ), UNIQUE ( Col3, Col4 ) );

We now have a PRIMARY KEY constraint on the first two columns, and a UNIQUE constraint on the last two columns.

Deleting tables

Now we will remove all the tables that we created with the "DROP TABLE TableName" command. This will delete the tables and all their data.

0100. MonetDB  – URL and Network Data Types

URL Data Type is used for storing URL addresses. We will create a table with this type of data. We can also use something like URL(512) if we want to limit the number of characters.

CREATE TABLE URLtable( URLcolumn URL );

Inside of this column we can store both string and URL data types.

INSERT INTO URLtable ( URLcolumn ) VALUES ( 'https://www.monetdb.org/documentation/user-guide/' );
INSERT INTO URLtable ( URLcolumn ) VALUES ( URL'https://www.monetdb.org/documentation/user-guide/' );

We can create URL data type by using CONVERT and CAST functions, or by using URL prefix:

SELECT CONVERT('https://www.monetdb.org/documentation/user-guide/', url);
SELECT CAST('https://www.monetdb.org/documentation/user-guide/' AS url);
SELECT URL 'https://www.monetdb.org/documentation/user-guide/';

URL Data Type functions

Suppose we have this URL in our table:

'https://me@www.monetdb.org:458/Doc/Abc.html?lang=nl&sort=asc#example'

We can divide this URL into its component parts:

https://me@www.monetdb.org:458/Doc/Abc.html?lang=nl&sort=asc#example
protocol
or
schema
userhostportpath
or
context
queryanchor
or
fragment

We can then read all these parts using the MonetDB URL functions. If a part does not exist, these functions will return NULL. All functions return a CLOB.

To extract the "host" we have two functions, GETHOST and URL_EXTRACT_HOST. The URL_EXTRACT_HOST function accepts a second argument that can exclude the "www" part from the host, if the value of this argument is true.

SELECT SYS.GETPROTOCOL( URLcolumn ) FROM URLtable;https
SELECT SYS.GETUSER(URLcolumn) FROM URLtable;me
SELECT SYS.GETHOST( URLcolumn ) FROM URLtable;
SELECT SYS.URL_EXTRACT_HOST( URLcolumn, false )FROM URLtable
SELECT SYS.URL_EXTRACT_HOST( URLcolumn, true ) FROM URLtable;
www.monetdb.org
www.monetdb.org
          monetdb.org
SELECT SYS.GETPORT( URLcolumn ) FROM URLtable;458
SELECT SYS.GETCONTEXT( URLcolumn ) FROM URLtable;/Doc/Abc.html
SELECT SYS.GETQUERY( URLcolumn ) FROM URLtable;lang=nl&sort=asc
SELECT SYS.GETANCHOR( URLcolumn ) FROM URLtable;example

The path (context) can be further divided into:

SELECT SYS.GETFILE( URLcolumn ) FROM URLtable;Abc.html
SELECT SYS.GETBASENAME( URLcolumn ) FROM URLtable;Abc
SELECT SYS.GETEXTENSION( URLcolumn ) FROM URLtable;html

From the host we can read the domain separately:

SELECT SYS.GETDOMAIN( URLcolumn ) FROM URLtable;org

Using the SIS.ISAURL function, we can check if something is a valid URL.

SELECT SYS.ISAURL( URLcolumn ) FROM URLtable;true

Robots.txt is a text file with instructions for search engines (e.g. Google). It is always located in the root directory of the web server. This function will return the location of Robots.txt.

SELECT SYS.GETROBOTURL( URLcolumn ) FROM URLtable;https://me@www.monetdb.org:458/robots.txt

All the functions above will accept either string or URL data type as its argument. For creation of a new URL, we need arguments that are strings (or integer for ports). Function for creation of a new URL-s, can accept two combinations of arguments, as we can see bellow. Compound argument 'usr@www.a.com:123' is called "authority".

SELECT SYS.NEWURL( 'https', 'usr@www.a.com:123', 'docs/index.html');https://usr@www.a.com:123/docs/index.html
SELECT SYS.NEWURl( 'https', 'www.a.com', 5567, 'docs/index.html');https://www.a.com:5567/docs/index.html

Network Data Type

This data type is used to store IPv4 addresses, such as '192.168.1.5/24'. We can create column with this data type, and we can write strings and network data type in this column.

CREATE TABLE InetTable ( InetColumn INET );
INSERT INTO InetTable ( InetColumn ) VALUES ( '192.168.1.5/24' );
INSERT INTO InetTable ( InetColumn ) VALUES ( INET '192.168.1.5/24' );

For creation of INET data type we can use CONVERT, CAST functions or INET prefix operator.

SELECT CONVERT('192.168.1.5/24', INET);
SELECT CAST('192.168.1.5/24'  AS INET);
SELECT INET '192.168.1.5/24';

Network Data Type Operators

Network data type operators are used to compare two network addresses. Network addresses consist of four numbers between 1 and 255 ( 1.1.1.1 to 255.255.255.255 ). We can add zeros to those numbers, so that each number has three digits. Then we can remove dots.

17.18.203.1   =>017.018.203.001       =>017018203001
221.42.2.56   =>221.042.002.056       =>221042002056

Logic of comparison is simple, if 221042002056   >   017018203001 then 221.42.2.56   >   17.18.203.1 . We can test this with MonetDB operator.

SELECT INET '221.42.2.56'   >  INET '17.18.203.1';true

We can use all mathematical operators. If they can work on numbers, they can work on network addresses.

<<==>=><>

Network Data Type Operators for Belonging

Maximal network address can be 255.255.255.255. Number 255 can be presented as 28. This means that total number of combinations is 28 x 28 x 28 x 28=232. We can say that each number has 8 bits, but total IP address has 32 bits.

We have address for a network, and address for a computer on that network.

Network addressComputer address
221221.42.2.56On large networks we can have many computers, so we need three numbers to label them all.
221.42221.42.2.56On medium networks two numbers are enough to mark all of our computers.
221.42.2221.42.2.56On small networks we only need one number. Here we can have up to 255 computers.

In the second column, in the table above, we see that the computer address is composed of two parts. One part will indicate the network and the other will be for the computer on that network. I used red and black to make a difference. In the real world, this distinction is made using this syntax such as "221.42.2.56/24". This "/24" suffix means that the first 24 bits (the first three numbers) are for the network and the rest for the computer.

Now we can understand operators for belonging. These operators will tell us whether some computer address belong to some network.

SELECT INET '192.168.1.5' << INET '192.168.1/24';trueComputer 192.168.1.5 belongs to network 192.168.1.
SELECT INET '192.168.1/24' <<= INET '192.168.1/24'trueSame as above, but this will also return true if we compare
two same networks. In our example we have the same network
on the both side of the operator.

Of course, we can also use reverse operators >> and >>=, so we have to be careful what is on the left, and what is on the right side of our operator.

Network Data Type Functions

All of these functions will accept only INET data type.

SELECT SYS.ABBREV( INET '10.1.0.0/16')10.1/16This function will remove the trailing zeros,
to shorten network address. Returns CLOB.
SELECT SYS.BROADCAST( INET '192.168.1.5/24')192.168.1.255/24If we want to send message to all of the computers on
the network, we use broadcast address. This is network
part of address, and all other numbers are 255. In our
example we have 192.168.1 + 255 = 192.168.1.255.
Returns INET.
SELECT SYS.HOST( INET '192.168.1.5/24')192.168.1.5This will just extract host from network address
(by removing "/24"). Returns CLOB.
SELECT SYS.LEFT_SHIFT( inet '192.168.1.5'          
                                          , inet '192.168.1/24' )
trueSame as
"SELECT INET '192.168.1.5'  <<  inet '192.168.1.5/24';".
This computer address is from this network.
SELECT SYS.LEFT_SHIFT_ASSIGN(inet '192.168.1/24'
                                                         , inet '192.168.1/24')
trueSame as
" SELECT INET '192.168.1/24' <<= INET '192.168.1/24' ".
SELECT SYS.MASKLEN( INET '192.168.1.5/24' )24This will return "/24" as an integer.
SELECT SYS.NETMASK( INET '192.168.1.5/24' )255.255.255.0This is a different syntax for "/24". They mean the
same thing. Returns INET.
SELECT SYS.NETWORK( INET '192.168.1.5/24' )192.168.1.0/24It will extract network part of an address. Everything
else will be zero. Returns INET.
SELECT SYS.RIGHT_SHIFT( INET '192.168.1/24'
                                           , INET '192.168.1.5' )
trueSame as
"SELECT INET '192.168.1/24' >> INET '192.168.1.5' ; ".
This computer address is from this network.
SELECT SYS.RIGHT_SHIFT_ASSIGN( INET '192.168.1/24'
                                                            , INET '192.168.1/24')
trueSame as
"SELECT INET '192.168.1/24' >>= INET '192.168.1/24' ;".
SELECT SYS.SETMASKLEN( inet '192.168.1.5/24',  16 )192.168.1.5/16Returns INET with "/24" changed to "/16".
The second argument is an INTEGER.
SELECT SYS.TEXT( INET '192.168.1.5' )"192.168.1.5/32"Returns INET as a text. It attached "/32" part.

There is another function. I don't know what it's used for, but I figured out how it works. It seems that this function will show us the maximum number of computers that can be in this network.

SELECT SYS.HOSTMASK( INET '192.168.23.20/30' )0.0.0.3It is calculated like 232 / 230 = 22 = 4. Result is 4 -1 = 3.
SELECT SYS.HOSTMASK( INET '192.168.23.20/24' )0.0.0.255It is calculated like 232 / 224 = 28 = 256. Result is 256 – 1 = 255.
SELECT SYS.HOSTMASK( INET '192.168.23.20/17' )0.0.127.255It is calculated like 232 / 217 = 215 = 27 x 28 = 128 x 256.
Result is 128 -1 = 127 and 256-1=255.