Literals in Power Query (M language)

Table

We can create empty table by specifying number of columns to create. For values we just have to provide empty list.
#table(4,{})
This is how we create regular table. All the columns will have unspecified type.
#table( {"A", "B"}, { {1, 2}, {3, 4} } )
We can assign type to table columns.
#table(type table [Digit = number, Name = text], {{1,"one"}, {2,"two"}, {3,"three"}} )

Some information about Tables

It is possible to make an union of several tables with "&" operator. Orphan columns will be filled with nulls.
#table({"A","B"}, {{1,2}}) & #table({"B","C"}, {{3,4}})

Tables are considered the same if for each column in one table there is equivalent column in another table. Order of columns is not important.

#table({"A","B"},{{1,2}}) = #table({"A","B"},{{1,2}}) // true
#table({"A","B"},{{1,2}}) = #table({"X","Y"},{{1,2}}) // false
#table({"A","B"},{{1,2}}) = #table({"B","A"},{{2,1}}) // true

Number

There are three ways to write literal numbers.
Regular = -1.5
Scientific = 2.3e-5
Hexadecimal = 0xff
There are also two special symbols for infinity and for "not a number". Infinity is something we get when we try to divide one and zero. If we try to divide zero and zero, we will get "NaN". "NaN" is the only value that is not equal to itself.
Infinity = #infinity
NaN = #nan

Type

With "type" keyword we can create literal types.
ListOfNumbers = type { number }
TwoNumberFieldsRecordType = type [ X = number, Y = number ]
DateTimeType = type datetime
TextType = type text

Logic, text and null

Logic literals are TRUE and FALSE. Text literals are wraped with quotes. Null is a special literal which symbolizes missing data.
TRUESymbol = true
FALSESymbol = false
TextLiteral = "word"
NullSymbol = null

Record

This is how we can write record literal.
Record = [ A = 1, B = 2 ]

Some information about records

Records are equal if for each field in one record ther is an equal field in another record. Order of fields is not important.

[ A = 1, B = 2 ] = [ A = 1, B = 2 ]        // true 
[ B = 2, A = 1 ] = [ A = 1, B = 2 ]        // true

We can make a union of several records. If we have one field in more than one record, then the value from the last record with that field will be the final one. In our example x will be assigned the value 3, and not the value 1.

[ x = 1, y = 2 ] & [ x = 3, z = 4 ]  // [ x = 3, y = 2, z = 4 ] 

List

List is created by placing comma separated list inside of curly brackets.
List = {1, 2, 3}

Some information about lists

Two lists are equal if the have the same elements and position of those elements is the same.

{1, 2} = {1, 2} // true 
{2, 1} = {1, 2} // false

We can concatenate several lists with & operators.

{1, 4} & {2, 3} // list concatenation: {1, 4, 2, 3} 

Special Signs

Special signs, that are invisible, can be written by their symbolic presentation.
TAB = #(cr)
LineFeed =  #(lf)
CariageReturn = #(tab)

I used some unusual unicode brackets in the "Expression" column to prevent typed text to be identify as a special sign. In the "Result" column, we can see that both "cr" and "lf" signs are causing line break.

If we want to type two characters "#(", we have to write them like "#(#)(", because those characters has to be escaped. Two signs "#(cr)#(lf)", when consecutive, can be typed as "#(cr,lf)", the result would be the same.

Unicode Signs

We can enter any Unicode symbol in Power Query by using their codes.
YenSymbol = "#(00A5)"
CopyRightSymbol = "#(00A9)"
CapitalTheta =  "#(0398)"

Binary – list of bytes

Binary data is actually list of bytes. We can present those bytes with letters, hexadecimal numbers or ordinary numbers.
BinaryByLetters = #binary("AQID")
BinaryByHexidecimalNumbers = #binary( {0x00, 0x01, 0x02, 0x03} )
BinaryByRegularNumbers = #binary({65, 66, 67})

Date and time

Date and time can be expressed with these literals.

#time(hour, minute, second)
#date(year, month, day)
#datetime(year, month, day, hour, minute, second) 
#datetimezone( year, month, day, hour, minute, second, offset-hours, offset-minutes) 
#duration(days as number, hours as number, minutes as number, seconds as number)
For each argument there is a limit on what values that argument could have.
1 ≤ year ≤ 9999
1 ≤ month ≤ 12
1 ≤ day ≤ 31
0 ≤ hour ≤ 23
0 ≤ minute ≤ 59
0 ≤ second ≤ 59
-14 ≤ offset-hours ≤ 14
-59 ≤ offset-minutes ≤ 59

Some information about dates and times

We can concatenate date and time.#date(2013,02,26) & #time(09,17,00) // #datetime(2013,02,26,09,17,00)
We can add duration to date or to time.#datetime(2010,05,20,0,0,0) + #duration( 8, 0, 0 ) //#datetime(2010,5,20,8,0,0)
#time(8,0,0)+#duration(30,5,0,0)     //#time(13,0,0)
Duration can be multiplied.#duration(2,1,0,15.1) * 2      // #duration(4, 2, 0, 30.2)
Durations can be divided.#duration(2,0,0,0) / #duration(0,2,0,0)        //24
Dates and times can be converted to numbers and from.Number.From(#datetime(2020, 3, 20, 6, 0, 0)) // 43910.25
Date.From(43910) // #date(2020,3,20)
Time.From(0.7575) // #time(18,10,48)
Duration.From(2.525) // #duration(2,12,36,0)

Sample file can be downloaded from here:

Descriptive Statistics in SPSS

We saw in one of earlier posts what are descriptive statistics. Let's see how to calculate those statistics in SPSS.

Descriptive Statistics Dialogs

SPSS has two similar dialogs for creation of descriptive statistics. Both dialogs can be called from menu Analyze > Descriptive Statistics (0). There we can choose option Frequencies (1) to get dialog (2), or we can choose Descriptives (3) to get dialog (4). Dialogs (2,4) are similar. Dialog Frequencies gives us more options than dialog Descriptives.

First step is to select columns for which we will calculate descriptive statistics. By using mouse, and keys Ctrl or Shift, we should select some of the columns from the first pane (1). Then we will click on button (2) so that columns are moved to second pane (3). These are the columns for which we will get our descriptive statistics.

If we had made a mistake, it is possible to move columns from the right pane to the left pane, by using the same process (4,5,6). It is also possible to move selected columns to opposite pane by dragging and dropping with mouse (7).

Analyze > Descriptive Statistics > Frequencies

In Frequencies dialog, we should click on "Statistics" button (0). This will open new dialog where we can choose what descriptive statistics should be calculated. We can choose Percentiles (1), Central Tendency statistics (2), Measures of Variability (3), and Distribution (4).

Option (5) "Values are group midpoints" is used when our source data is grouped and each group is presented by one value. For example, all people in their thirties are coded as value 35. In that case this option "Values are group midpoints" will estimate Median and Percentiles for original, ungrouped data.

Custom percentiles are added by typing them into textbox (1) and then we click on "Add" (2). This will add that percentile ("24") to the pane bellow. If we click on "Change" (3) then the new value ("24") will replace currently selected old value ("15"). If one of the values (4) is selected then we can remove that value with button "Remove" (5).

Now we can click Continue and OK to close all dialogs and SPSS will calculate our results. All the results will be presented in one tall table (1). On the top we can see valid and missing values (2). Missing values are nulls. Bellow are all the others statistics that we are already familiar with (3).

At the bottom of the table, we have percentiles. All the percentiles are presented together (4,5,6). I've color-coded the percentiles here to help us understand the "Percentile Values" options. If we check option "Quartiles" (4), we will get percentiles 25, 50, 75. Option (5) allow us to divide 100% to several groups of equal size in %. If we want to get 5 groups, then 100% will be divided by using 4 cut points. Those cut points are 20, 40, 60 and 80 . At last, we will see all custom percentiles we have entered (6).

For several selected variables result would be presented as new columns in the result table. Order of columns will be the same as order of selected variables.

Analyze > Descriptive Statistics > Descriptives

If we use Descriptives dialog, process is similar. First, we select our columns (3) and then we click on button (1). This will open dialog (2) where we can select what descriptive statistics should be calculated.

Final result will be presented in one table where each variable will have its results presented in one row. Order of rows can be controlled by options (4). Default is to use "Variable list" as order of columns (5).

OCR a PDF Document With Python and Tesseract

We have a pdf document that is like a picture. We want to turn it into searchable document. We will use python Pytesseract module to do the transformation.

WinPython

For this demonstration I will use WinPython distribution. This distribution has size of 813 MB for download. When decompressed this archive will become a 4 GB folder. Choose wisely where to place this folder because latter moving or deleting this folder could take some time.

I am using WinPython distribution because it has many modules already preinstalled, and this will save us some effort. It will also keep us from dependency hell problems.

We can download WinPython from this page "https://sourceforge.net/projects/winpython/files/". On that page click on the green button (1) and download EXE file on your computer (2). This file is self-extracting archive (3). Choose where do you want this file to be extracted. You will get a folder (4). In the subfolder "python-3.10.5.amd64" (5), of that folder, there is "python.exe" file.

PATH Environment Variable

Windows has a small key-value "database", where each key-value pair is one environment variable. We can open dialog with those variables by going to Run dialog (shortcut Win+R), and typing "sysdm.cpl" (1). In the new dialog we will go to Advanced tab (2), we will click on "Environment variables" button (3), and then we will see dialog with a list of all the key-values pairs (4,5) for environment variables. Some variables are available only for the logged user (6), and system variables are available for all the users (7).

For example, there is key-value pair TEMP = "C: \Windows \TEMP". This allows every program to easily find where Windows temporary folder resides. If Microsoft change the location of this folder, all other programs will still work correctly because they will read that new location from the environment variable. Name of a key is stable, only value is changing, so environment variables are used for communication between programs.

There is an environment variable named PATH. We will change value of this variable so we could easily use our WinPython. In user variables (1) find a variable named "Path" (2). Click on Edit button (3), and then, in a new dialog, click on button "New" (4). This will create a place where we can paste our python folder (5). Python folder is a folder where "python.exe" file resides.

If we don't have Path variable, then we click on "New" button (6) and we fill new dialog with the variable name and variable value (7). Now, python location is added to path variable.

Tesseract

Tesseract is an open-source optical character recognition software. Python is using this software for OCR. First, we go to address https://digi.bib.uni-mannheim.de/tesseract/, and from there we download this software. We will download latest version (1). After EXE file is downloaded to our computer (2), we will start installation. During the installation we can choose (3) additional scripts (Latin, Cyrillic…) and languages (Serbian, Romanian…). Selected scripts and languages will be downloaded during installation (4).

Folder with "tesseract.exe" file should be added to our path user environment variable ( "C:\Program Files\Tesseract-OCR" ). Now that both python and Tesseract are on the path, we can check whether they are correctly installed. Go to windows command prompt and type commands "python –version" and "tesseract -v". We will now see what versions are installed on our computer.

There is no need to change working directory inside command prompt. Windows will find both python and Tesseract because they are on the path. This is why we placed them in this environment variable in the first place. Now it is much easier to find these two programs.

Pytesseract

Pytesseract is python module which communicate with Tesseract software. We will use "pip" module for its installation. Go to command prompt and type command "python -m pip install pytesseract" (1). After successful installation we can check our installed modul with command "python -m pip show pytesseract" (2).

Poppler

Poppler is open-source library for PDF rendering. Go to web site https://blog.alivate.com.au/poppler-windows/ and download latest poppler (1). Downloaded file will be "7z" archive. Extract files from that archive into folder of your choice. Poppler's folder has subfolder "bin" (2). We should add this "bin" folder to path user environment variable.

Pdf2image

Pdf2image is python module for interacting with poppler program. We will install it in the same way as pytesseract module. We'll use "python -m pip install pdf2image" (1) and "python -m pip show pdf2image" (2).

Python IDE

We finally have all the ingredients for our project. Now, we will open our favorite python IDE or text editor. If you don't have one, you can go to WinPython folder. Inside it, there is a file "Spyder.exe". Click on this file and you will get Spyder IDE where we can test our code.

Before running the script, we will first tell Spyder what python.exe to use. Go to Tools > Preferences (1). In the new dialog, type location of our WinPython python.exe file in "Python interpreter > Use the following Python interpreter" (2).

Now we can paste our python script in the pane (3) of the Spyder, and click the Run button (4). OCR-ed text will appear in the console (5) and will also appear as a TXT file on our desktop (6).

Python Script

This is code we are using for our OCR. First, we transform our PDF to sequence of images, using "pdf2image" module. For each page in that sequence, we apply tessarect image_to_string method. This will create our final result. At the end, we write our result in console and in TXT file.

image = pdf2image.convert_from_path('C:\\Users\\Sima\\Desktop\\OCRfile.pdf')
with open('C:\\Users\\Sima\\Desktop\\OCRtext.txt', 'w') as file:
    for page number, page in enumerate(image):
        detected_text = pytesseract.image_to_string(page)
        print( detected_text )
        file.write( detected_text )

Bellow, we can see how our pdf document looks like as an original (1), in console (2) or in TXT file (3).

Here is the sample file. It contains python script, sample PDF file and poppler software.

Basic Type Functions in Power Query

In "Data Types in Power Query" post we saw what types exist in Power Query. Let's see now what functions Power Query has that can deal with types. Most of such functions are classified in "Type" group, but the important ones are placed in "Value" group.

Basic Functions

In the example bellow, the first column has data of different types. All other columns are created by using some Power Query functions, except the column "TypeName" which is created by using our custom function. Name of each column is the same as the name of a function used.

Value.Type(value as any) as type     

Value.Type function receives the values from the first column and then returns its type. In Power Query types are presented only by green word "Type", and those results in second column don't help us much. In order to reveal what type each value really is, we will use our custom function "TypeName". You can find that function in sample file bellow or you can see it in this post. This function will take types from the second column and for each "Type", it will reveal its correct type in the third column. Now we can see all the different types (type null, type text, type date…). 

Value.Type( [Data] ) 
 
TypeName( [Value.Type] )
 
Type.Is( [Value.Type],type number )
 
Value.As( [Data], type number )

Value.Is( [Data], type number )

Type.Is(type1 as type, type2 as type) as logical

Type.Is function will compare two types and will return TRUE or FALSE. In our example we compared types from the column Value.Type with type number. Obviously, only cell in eighth row will return TRUE, because 999 is a number ✶.

Value.As(value as any, type as type) as any

Value.As function is returning value from the first argument, but only if type of that value is compatible with the type that is second argument. If they are not compatible, this function will return error. Because we used type number as second argument, only the cell in eighth row will return 999 , all other cells will return Error.

Value.Is(value as any, type as type) as logical

Value.Is function is comparing type of some value with the type that is the second argument. It will return TRUE or FALSE. In our example, we compare types for values in the first column with type number. This is the reason that only cell in eighth row will return TRUE .

Is and As Operators

If we don't want to hard code our data types, then we have to use Value.As and Value.Is functions. If hard coding our types is OK, then we can use is and as operators. Expressions on the right side will create the same two last columns as in example above.

Not Hard CodedHard Coded
Value.As( [Data], type number )[Data] as number
Value.Is( [Data], type number )[Data] is number

Filtering Reports in Cognos

Levels of Filtering

We have source data (1). For each order we have products on that order and their quantity. We made report by grouping our data. We grouped data by "Product" (2), then by "Product type" and "Product line" (3,3). We can filter data by using all those three levels of data (1,2,3,3).

Filtering by Data Type

There are specialized dialogs for filtering Dates, Strings and Numbers. Dates could be filter by range (1) or individually (2). Strings could be filtered by selection (3) or by condition (4). Numbers could be filtered by range (5) or individually (6).

Filtering by Using Multiple Columns

Beside filtering with one column, we also have "Combined" and "Advanced" filter (1). Combined filter is just a combination of simple filters (2). First we filter some Dates, Strings and Numbers by using Cognos dialogs, and then we can combine those filters. Those filters can be connected with logic operators AND, OR, NOT.

Advanced filter allows creation of really complex filters. Such filters are expressions made of columns, built-in functions and parameters (3). Cognos Expression builder helps us to create such expressions by making lists of all functions and columns that we can use in our expressions. We just have to click on chosen column, function or parameter that we want to appear in our expression. For each used function, we will get explanation for its usage, arguments and returned values (4).

Filtering by Parameters

It is possible to parametrize report filter. Parameters can be used in Advanced filter where we refer to parameter as its name delimited by question marks (1). User can provide values for this parameters before execution of report, but he doesn't have to, default value can be used then instead.

Parameter values can be provided by using a variety of different prompts (2).

Other Functionalities Pertaining to Filtering

It is possible to present to user which filters are used in the report (1). We can control which filters will be presented in such a way and which will not. This will help user to understand what data are shown in the report.

After creating report in HTML format, user can further filter data by using different prompts. It is not necessary to create explicit prompts for this task. We can just allow user to change or delete some of the filters and such ability will appear in the HTML report in a Filters pane (2). This ability is only available for simple and combined filters, and not for advanced filters.

Video that Explains How to Use Cognos Filters

Watch about all different ways how to filter data in Cognos report.