How to Transform (Recode) Variables in SPSS

Recoding is process of transforming variable values by using some rule. Simple example would be that we want to replace quarter of the year values (Q1,Q2,Q3,Q4) with half of the year values (H1,H2).

As a result we can create new variable (as on the image above) or we can overwrite existing variable. This depends on whether we open dialog Transform > "Recode into Same Variables" or "Recode into Different Variables".

Both options will open similar dialog. The difference is that "Recode into Same Variables" will not have textboxes (3,4) and button (5) because there will be no new variable. This is the only difference so we will explain only "Recode into Different Variables" case.

First, we choose column to recode (1) and we add it to pane (2). Next, we give name and label to the new column (3,4). After that, we click on the "Change" button (5) and name of the new column will be added to pane (6). Now we have to define how new column will be created. For that we use dialogs "Old and new Values" (7) and "If…" (8).

System Missing and User Missing Values

This terms are used in dialog "Old and New Values". If we have some values missing in our data source, SPSS will present such values with a dot (1). Those are called System Missing Values.

Other possibility is for user to declare some values as invalid or impossible. This is done in "Variable view" (2). In new dialog user will enter three discrete values (3), or combination of one range and one discrete value (4). Invalid values will not be specially labeled in "Data view", but rows with such values will be excluded from any further analysis by the SPSS. This is explained in the blog post https://bizkapish.com/spss/spss-data-entry/.

Old and New Values

We want to transform quarter of year values (coded as 1,2,3,4) to half of year values (coded as 1,2). We enter code of a quarter in (1), and code of a half of year in (2). Then we click on button "Add" (3) and new transformation will appear in pane (4). We have to do this for each combination of quarter and half of year. We also have option to change the type of a result value. We can change it from number to string (7), or from string to number (8), if possible.

"Old and new values" dialog has left (5) and right (6) side. On the left side we can choose to transform specific value, missing values, range of vales, or all other values. On the right side we can choose to transform those values into specific value, missing value or just to copy old values. So, anything selected on the left side can be transformed to anything selected on the right side.

Conditional selection

We can make transformations conditional. For that we use "If…" dialog. In this dialog we use columns (1) to create expression (2). Transformation will occur only in those rows where this expression returns true. We can type expression manually, but we can also use any of the controls (3) to add building blocks in our expression. This is better explained in the post https://bizkapish.com/spss/how-to-create-new-variables-in-spss/.

In image above we typed expression "year = 1". This means that only quarters in the year 2007. will be transformed to half years. Rows for 2008. will be missing (presented by dot). If we want to fill cells for 2008. we have to create new transformation, but this transformation has to use the same name of new variable .

Let's say that we want to perform three conditional transformations. Each next transformation will overwrite cells where its condition is fullfiled. If there is overlap in some rows for two transformations, the last transformation will prevail. See the image. In the last step we change our transformation to "9", just to make a difference.

Value Labels

Last step is to associate new code values with their text labels. This is done in usual way, in "Variable view".

Sample data can be downloaded here:

Discover relative path to Power BI Desktop file

Power BI Desktop file is in folder with some Excel files. Those Excel files are used as a data source. If we move folder with all of this files to some other location, refreshing of PBID file will not work any more. PBID will not be able to connect to Excel files because fullpath of those Excel files will be changed. Solution is to find address of a folder where PBID file reside. That way, we will always know where our Excel files are.

When we run a PBID, a process with the name PBIDesktop.exe is created. We can find it in Details tab in Task Manager:

There is a console program named Handle64. That program can list all the files that are locked by specific process. In CMD environment we can type:

C:\Users\Sima\Desktop\handle64.exe -p PBID

This will return all files locked by PBIDesktop process:

Because our PBIX file is locked by PBIDDesktop.exe, its fullpath will be listed in there. We just have to filter this list by using name of our PBIX file and we will find location of our PBIX file. That way we will find folders where our Excel files are located, so we can create relative path.

This console program can be downloaded from the address below, but it is also available at the end of this blog post.
https://docs.microsoft.com/en-us/sysinternals/downloads/handle

All we have to do now is to call this Handle64 program from PBID, and for this we are going to use python script. For python script to work we have to tell Power BI desktop where is pythons home folder. We can do this in Options > Python scripting. If we have python installed we can select its home folder from drop down menu (1). I am using portable version of python so I have to tell PBID explicitly where my python is. I can do that with (2) Browse button.

This is our python script. Blue part will read result of CMD command and encode it in UTF-8. Green part will remove all other text, so that at the end, we only have our fullpath to return.

import pandas
import subprocess
FilesListBytes = subprocess.check_output(r'"D:\Programi\handle64.exe" -p PBID', shell = True)
FilesListString = FilesListBytes.decode('utf-8')
FirstSplit = FilesListString.split('Name of the file.pbix', 1)
SecondSplit = (FirstSplit[0].rsplit('File     ',1))[1]
ReplaceSlashes = SecondSplit.strip().replace("//","/")
df = pandas.DataFrame([[SecondSplit.strip()]],columns=['Files'])
print (df)	

Name of our PBID file will not be hardcoded. We will read it from table that we prepare in advance in our PBID file.

Sample PBID file, whole Power Query script that contains python code, can be found in this attachment. For this script to work on your computer you have to change fullpath of Handle64.exe. That fullpath is in the first line of the script:
Handle64location = """C:\Users\Sima\Desktop\handle64.exe""", change it to where you placed Handle64.exe program.

Comparer functions

Culture.Current

This is a variable that returns current culture. Current culture is described by symbols like "cs-cz" (for Czech language) or "en-gb" (for UK english). Culture is defined in Options, under Regional Settings, both in PowerPivot and Power BI Desktop.

The code below would return string "sr-Latn-RS". This blog post has, bellow, attachment "Language codes.xlsx" which contains list of all language codes.

let
‏‏‎ ‎‏‏‎ ‎‏‏‎ ‎‏‏‎ ‎‏‏‎ ‎‏‏‎ ‎‏‏‎ ‎‏‏‎ ‎Source = Culture.Current
in
‏‏‎ ‎‏‏‎ ‎‏‏‎ ‎‏‏‎ ‎‏‏‎ ‎‏‏‎ ‎‏‏‎ ‎‏‏‎ ‎Source

Comparer.FromCulture()

Imagine that we want to compare two values based on specific culture and case sensitivity. To compare strings "A" and "a" based on "sr-Latn-RS" culture while ignoring case sensitivity, we could use some function that look like this:

SomeFunctionUsedForComparison( "sr-Latn-RS", true, "A", "a" )

Power query can do the same thing but it does it in two steps. First step is to create comparison function:

ComparisonFunction = Comparer.FromCulture("sr-Latn-RS", true)

Second argument is for case sensitivity. Default is false (sensitive). This argument is optional.
After this, we can use our new ComparisonFunction to make comparison when ever we need it.

Second step is to call this function when we need it. We just supply two values to compare.
ResultOfComparison = ComparisonFunction( "A", "a")
We can see on the right, that this new function receives two arguments to compare. Those two arguments could be of any type. As a result, this new function returns a number. That number is "0" for equality and "-1" or "1" for inequality.

Here are three examples of this function. For brevity we will merge two steps into one step:

Comparer.FromCulture("sr-Latn-RS", true)( "A", "a" )Returns "0". Equality.
Comparer.FromCulture("sr-Latn-RS", false)( "a", "A" )Returns "-1". Inequality. "a" < "A".
Comparer.FromCulture("sr-Latn-RS", false)( "A", "a" )Returns "1". Inequality. "A" > "a".

We can see that, because of different case sensitivity, we get different results.

Comparer.Ordinal()

This function is used to compare two values. Comparing is based on ordinal rules. Lets asume that we compare:

æ=UNICHAR(230) 'by excel formulaANDae=UNICHAR(97)&UNICHAR(101) 'by excel formula

If we compare this two values with Comparer.FromCulture, for the result we would get equality.

Comparer.FromCulture( "sr-Latn-RS", true )( "æ", "ae" )Returns "0". Equality.

This is because, culture help us to compare values that have similar meaning.

Comparer.Ordinal() is ortodox function. It compares unicode codes, and not the meaning of symbols. This function will compare code "230" with codes "97" and "101". It will find that this two values are not the same. Let's see three examples for this function:

Comparer.Ordinal("æ","ae")Returns "1". This is because 230 > 97.
Comparer.Ordinal("ae","æ")Returns "-1". This is because 97 < 230.
Comparer.Ordinal("æ","æ")Returns "0". This is because 230 = 230.

As we see, when comparing numbers, we know which number is bigger. That is way we get results "-1,1" depending on which number is larger or "0" if numbers are equal.
"ae" is consider as two unicode numbers, so for concluding sort order we only need code number for letter "a".

Comparer. OrdinalIgnoreCase()

This function is the same as Comparer.Ordinal(), except it is case insensitive.

Comparer.OrdinalIgnoreCase( "A", "a" )Returns 0.
Comparer.OrdinalIgnoreCase( "a", "A" )Returns 0.
Comparer.OrdinalIgnoreCase( "A", "A" )Returns 0.
Comparer.OrdinalIgnoreCase( "A", "b" )Returns -1.
Comparer.OrdinalIgnoreCase( "b", "A" )Returns 1.

Comparer.Equals()

This function can use previous functions as arguments to compare two values. Here are three examples:

Comparer.Equals( Comparer.FromCulture( "nl-NL", true ), "a", "A")Returns "true".
Comparer.Equals( Comparer.Ordinal, "a", "A")Returns "false".
Comparer.Equals( Comparer.OrdinalIgnoreCase, "z", "z")Returns "true".

As we can see, we are using three arguments. First one is function defining how to compare values in other two arguments. Comparer.Equals() returns "true" if values are equal and "false" if they are different.

List of all culture codes and examples for comparer functions can be found in this file:

Filtering data in SPSS

Our table has data about cable television users. We have ID of the customers (1), their gender (2), age group (3) and education level (4). Column "reside" (5) shows how many people live in the household. Column "service_usage" (6) shows what kind of cable service they use. We will see how to filter this data in different ways in SPSS program.

On the image on the right we can see that filtering dialog is opened by clicking on Data > Select Cases (1). This opens dialog where we can select columns to filter (2), we can choose how to filter (3), and we can decide what will happen with filtered data (4).

5 ways to filter data

All cases

"All cases" option means that there will be no filtering. This is the default.

If condition is satisfied

This option opens dialog where we can define formula that will filter data. We can click on column names (1), math and number symbols (2), and we can choose some of built-in functions (3). What we click, will appear in pane (6). We can also type by hand what we want in pane (6). Whole expression can be typed manually, but it is more easier to select elements of expression. Pane (5) will filter functions presented in the pane (3). This will help us to find function we want. When we select one of the functions in the pane (3), we can see its syntax and description in the pane (4).

Now that we created our filter (7), we will choose to "Filter out unselected cases" (8). This option will not hide or delete filtered data (9), but will only mark it as filtered (10). Although still visible, marked rows will not be included in SPSS calculations. This way we don't loose any data and we can after, apply some other filter on our table.

Random sample of cases

Button "Sample" opens dialog where we can use one of two possible random filters. First filter (1) will choose some percent of all the cases randomly. For this to show, I will use smaller table that has only 10 rows (2). If we choose to filter 10% of rows, only one row will be left (3). Other option (4) is to randomly select a limited number of rows from the specified number of first cases. We choose to select 3 rows from the first 5 rows, so only rows 1, 2 and 5 will be selected (5).

Base on time or case range

We can just choose one continuous range of cases. On image left, we can see (1) that we have chosen all cases between case 3 and 7 inclusively. All other cases will be crossed out (2).

Use filter variable

This option asks from us to select one of the columns (1). This column should have rows without data (2). All the rows where that columns doesn't have data will be filtered (3).

What will happen with filtered data?

Filtered data can be marked as filtered out and such rows will be excluded from further calculations (1). There are two more options. First is to create new Dataset(2). SPSS will open a new window. That window will show all the rows, but unselected rows will be crossed out. Other possibility is to delete cases that don't pass filter condition. Our Dataset will be then reduced by deleting not needed rows (3).
You should be careful when deleting the rows because deleting can not be undone.

Sample data can be downloaded from here:

VBA to connect to Power BI Desktop

We have our PBIX file opened (1). Our Excel is also opened. We click on button in the ribbon (2) and new pivot table is added to active sheet (4). This pivot table is connected to our PBIX file. Now we can easily create pivot table by using data from Power BI Desktop.

After closing Power BI Desktop, our pivot table will be non interactive. We can open PBIX file again, but this time Power BI Desktop will have some other credentials. This mean that our pivot table will still be non interactive. We have to click on button (3) "Reconnect" to establish connection again. Now we can use our pivot again.

This post will not explain how to create buttons (2,3). It will only explain VBA code needed to achieve described functionality. Code is simple, so it will work with only one PBIX file. Name of that file is hardcoded. This code has no error handling.

Structure of code

On the top we have two procedures "subCreateNewPivot" (1) and "subReconnect" (2). Beside them we have two groups of functions. "Red Orange" group is used to provide credentials of our PBIX file. "Green" group is used to create pivot table. "subCreateNewPivot" is using both groups, but "subReconnect" is using only the "red orange" one.

As explained in one of previous posts, each time Power BI Desktop is opened, it has different credentials and it writes them into location:
C:\Users\<Username>\AppData\Local\Microsoft\Power BI Desktop\AnalysisServicesWorkspaces (1)
That location has several subfolders (2), each for any of opened Power BI Desktop files. Sometimes there are subfolders that belong to already closed PBIX files, but this subfolders are incomplete and they will be deleted automatically after some time. Inside those subfolders there are two important files. File (3) has name of database in its name ( 601ae6aa-a3c6-4abb-8833-2253863854e4 ). File (4) is ordinary TXT file and it has port number of Power BI Desktop server as its only content.

There are two problems with this setup:
1) Credentials are changed each time Power BI Desktop is opened.
2) We don't know which credentials belong to which opened PBIX file.

Solution is to read all available credentials with function "funcAllCredentials". This function is using two helper functions "funcPort" and "funcDatabaseName". Those helper functions reads values of port number and database name from computer disk. This credentials are then given to function "funcPowerBIpivotDataCredentials" which have to decide which of credentials are the ones we are looking for. We are looking for credentials of "PowerBIpivotData" file. In order to identify correct credentials we are going to read values from tables that we previously created in all of our Power BI Desktop files. Each our PBIX files has table named "NameValue" that looks like (1,2). We will use function "funcPBIXname" to read from each open PBIX file from such table. If, by using some of credentials, we read the value "PowerBIpivotData" (1) from some of the opened PBIX files, that would indicate that those are credentials we need.

Credentials are all we need for "Reconnect" functionality. Procedure "subReconnect" will look for connection that has name starting with "PowerBIpivotData". This indicate connection that was created before, to create pivot tables. Procedure will then update this connection and its name with new credentials. After this, pivot tables will become interactive again.

Pivot table creation

"New pivot" button also needs information about credentials. This button will use those credentials to create new pivot table. Top procedure "SubCreateNewPivot" will call procedure "subNewPivot", and that procedure will create new pivot table, by using provided credentials. First, procedure subNewPivot will chech whether connection already exists. If we previously created some pivot table, then we already have connection and we don't have to create new one. We are using helper function "funcConnectionMissing" to check existance of such connection. If this is not true, then we will create new connection.

New connection will be used to create new pivot table. New pivot table will be created in the top left cell of user selection in the spreadsheet.

VBA code

VBA code is long enough to not be shown in the text of this blog. Bellow is link for downloading Excel file that has all VBA code and buttons in its ribbon tab. PBIX files are also included.