# 0170 Sampling, Analyze, Prepare in MonetDB

Let's create test table:

1) `CREATE TABLE samplingTable ( Letter CHAR, Number INTEGER );`

2) `INSERT INTO samplingTable ( Letter, Number )  VALUES ( 'A', 1 ), ( 'B', 2 ), ( 'C', 3 ), ( 'D', 4 ), ( 'E', 5 )`
` , ( 'F', 6 ), ( 'G', 7 ), ( 'H', 8 ), ( 'I', 9 ), ( 'J', 10 );`

## Two Ways of Sampling

Sampling of records is done with SAMPLE clause. This clause accepts continuous range [0,1) as an argument. Selected number is percent of records that will become part of the sample. It is also possible to select any integer bigger than zero. That number will become number of units in the sample.

Samples have uniform distribution.

## Table Statistics

### Statistics System Table

Statistics about tables is placed in the table sys.statistics. In order to read from this system table, we have to log in as a member of monetdb group.
First, we quit current session (1). Then we have to log in as a monetdb group member (2). Password is "monetdb" (3). Then, we are allowed to read from statistics table (4).

These are the columns in that table:

All the strings from this table are small case. When we refer to them, we have to write them exactly like in the table.

### Table Producing Functions

Table sys.statistics will give us statistics about all of the user tables in our schema. If we want to read statistics of system tables too, then we should use this table producing function.

For reading statistics for one schema, table, or column, we can filter sys.statistics or sys.statistics() tables. Instead of filtering, we should use these table producing functions with arguments which define what data we want to read. This table producing functions are fastest way to get specific statistics.

### Refreshing Statistics

We can refresh statistics for the whole schema, for the one table, or for the one column. The refresh will take a long time if we have a lot of data

Statistics is used by MonetDB, to optimize queries. If we made a lot of deletes, inserts or updates, statistics will become obsolete so we should refresh it. This will make our queries performant again.

## Prepare Statement

If we want to use some statement many times, we can prepare it as well. That means that executing plan for that statement will be cached during the session. Each sequential call of that statement will use cached version, but with different parameters.

As a monetdb user, our default schema is sys. Before going through example, we will first change our current schema to voc.

For example, let's say that we want to insert three rows in our table samplingTable. For that we can use three separate statements.

`INSERT INTO samplingTable ( Letter, Number ) VALUES ( 'O', 100 );INSERT INTO samplingTable ( Letter, Number ) VALUES ( 'P', 101 );INSERT INTO samplingTable ( Letter, Number ) VALUES ( 'Q', 102 );`

For each of these statements, MonetDB will create separate execution plan. To avoid doing the same work three times, we will prepare this statement.

We will now execute our three statements from above, by using this prepared statement. Zero is ID of our prepared statement.

We can release this prepared statement from our session cache.

Prepared statements will be deleted automatically, if any error happens during the user session.