SPSS

How to create new variables in SPSS

We have three columns with the scores of a judges (1). We want to create new column that will present an average score (2). For this, we are using option Transform > Compute Variable (3). This will open form where we can define new variable.

In new dialog, first thing is to give name to new variable (1). Button (2) can open small dialog where we can define label (3) for our variable. That label is just description of a variable. We can also define our variable as Numeric or String type (4). In this case, we will choose "Numeric". Pane (6) is where we create expression for new variable. This expression is comprised of column names, mathematical operators, builtin functions and constants. We can type the whole expression from our keyboard, but instead of typing we can just select some building blocks for our expression from other areas in this dialog, and they will appear in pane (6).

Pane (5) has names of all the columns. Pane (9) has names of functions. Virtual keyboard (7) allows us to type different signs and numbers by using mouse. We can use all those elements to create desired expression in pane (6).
By selecting some of functions groups in pane (8), we can limit what functions will be presented in pane (9).
When some of functions in pane (9) is selected, we will be able to see its syntax and description in pane (10).
We created formula (6). The formula "MEAN (judge1, judge2, judge3)" would give us the same result.

When we click on column name, that column name will appear in our expression: By using virtual keyboard we can add different operators and numbers to our expression.
Clicking on some of functions will add syntax for that function in expression pane. Then we'll have to replace question marks with functions arguments. Those arguments can be column names, constants or subexpressions.

In virtual keyboard we have signs "**", "&", "|". Sign "&" is for logical AND. Sign "|" is for logical OR. When we write "2**3", it is the same as "23".

Conditional variables

If some of values, in columns that we used to calculate our new variable, are missing, then the value in the same row will not be calculated for our new variable (only for our original expression, it seems that builtin MEAN function is resistant to this). Missing value will be presented with a dot.
If we have all the values in columns judge1, judge2, judge3, it is still possible to make variable with missing values. Example bellow shows how to do it.

Now we have one column more (1). This column shows, whether scores belong to junior or senior competitor. We want to create new variable that will present maximal value, but only for junior competitors (2). We can achieve this with conditional variables. First we create formula for our variable. We are going to use MAX function (3). Then we have to define condition. Place for that is opened by clicking on "If" button on the bottom of the main dialog (4).

MAX(judge1,judge2,judge3)       (3)

We will be greeted with a dialog where on the top we have to select option "Include If case satisfies condition" (1).

In pane (2) we have to enter condition. In every row where this condition is satisfied, new variable will have a value, and vice versa. Way of creating condition is similar as way of creating expression for new variable. We have columns (3), functions (5) and buttons (4) to help us create this boolean expression.
Our junior competitors are coded with number "0" (Seniors are "1"). This makes our conditional expression as simple as:

Seniority = 0

This is all if we want only "Junior" rows filled with values for new variable.

After this, we can create new variable with the same name (1). This variable will calculate maximal score for judge2 and judge3 (2), but only for Senior competitors (3). For some reason, we will consider scores of judge1 invalid for Senior competitors. SPSS will ask us whether we want to change existing variable (4). We'll click on YES. SPSS will not overwrite old column with new column. It will combine them. We can see in (5) that all rows are populated. Rows 1, 2 and 5 are for Junior competitors and they show maximum for "judge1,judge2,judge3". Rows 3 and 4 are for Senior competitors and they show maximum for "judge2,judge3". In row 3 we can see that the value in "Maximal" column is 8.8, which is maximum for columns "judge2,judge3". Total maximum is 8.9, but that maximum is in column judge1 that does not count.

We can continue to add more conditional variables with the same name. Each new variable will overwrite only results which satisfy its condition. If we add third variable with name "Maximal" and condition "judge2>8", it will only overwrite values in rows where this condition is fullfilled.

Two caveats to consider

  1. When we click on some function, its syntax will be added to expression pane. If we click on MAX function, syntax like this will be added "MAX(?,?)". This doesn't mean that this function can have only two arguments. It can have infinite number of arguments "MAX(a,b,c,d,e…)", but we have to type other arguments by hand. Same for many other functions.
  2. When we create new variable, we can give it the same name as one of original columns. For example, we have original column "judge1", and we create new variable with the same name "judge1". SPSS will ask us whether to change existing variable. If we click on YES, we are going to lose original column. New variable will overwrite original variable.

Here is a sample file:

SPSS data entry

We will see here how to manually enter data into SPSS, or automatically from Excel or from SQL Server. When we open SPSS, we can see Data View (1) and Variable View (2). Data View shows data and is like Excel spreadsheet table. Variable view is used to declare that "Data View" table. There we can declare columns, their content, formatting and possible values.

Manual entry

To enter data manually, it is enough to start typing in the cells in Data View. As we see, names of columns will be created automatically and we have to change them, together with other columns attributes.

Before explaining columns attributes let's recall of different measurement scales that are used in SPSS:
– Nominal scale is used for categorical data ( "man/woman/child" or "India/Japan/China" ).
– Ordinal scale is used for ordered data ( "good/neutral/bad" or "before/during/after" ).
– "Scale" is used in SPSS to label data that can be measured with some measuring unit ( height, weight, temperature ).

Data that is measured in Nominal and Ordinal scale has to be enter in SPSS as codes. This is is necessary so we can use all available statistical tools in SPSS. Coded means that each category has to be presented by number. For example "small, medium, big" can be presented with codes "1,2,3". Those codes are values that we enter into the program (1). Then, in the program itself, we assign one of the categories labels to each code. If we want, we can show those categories labels to user instead of codes (2).

By clicking on this button in the main toolbar, user can switch between the two views from the image above.

Declaration of code is done in "Variable View". Let's see what options are available in Variable View.

Variable View

"Variable View" is place where we enter columns attributes.
– In "Name" (1) we type correct name of a column. Name can have characters, numbers and underscores.
– In "Type" (2) we open new dialog (5) to choose between different data types. As we saw, because all categorical data should be coded, almost all of our columns should be declared as "Numeric".
– Width (3) is to limit how many characters can textual data has. Textual data longer than this will be truncated.
– Decimals (4) will limit number of decimal places presented in "Data View". This is just for visual representation. Real calculations will be conducted with all available decimal figures.

– In "Label" (1) we place short descriptions of our columns.
– In "Values" (2) we can set labels for data that is categorical in nature. This will open new dialog (5). So, if possible codes in column are "1, 2, 3" then we have to attribute label to each code. Our codes "1,2,3" can represent "Man, Woman ,Child". By clicking on button in toolbar, as explained earlier, user will be able to see those labels instead of incomprehensible codes.
– In "Missing" (3) we can determine values that are impossible or unacceptable. After we enter data, every value that is the same as those registered here, will be excluded from calculations as incorrect value. Such values will not be part of statistical calculations, SPSS will just ignore them. We can give three such discrete values (6). Other option is to give one interval and one discrete value (7).
– "Columns" (4) is visual width of column, measured in numbers of characters. We can also change width of columns with mouse on the same way as in Excel (8).

Last 3 column attributes are Align, Measure and Role. In align we can choose between Left, Right and Center alignment (1). Measure (2) is used to declare scale for data. This will not influence SPSS calculations but it is important to declare scale of data for other users of that data. In Role (3) we can just leave the default value ( "Input" ).

This process of declaring our columns should be done for data loaded from Excel or database, too.

Loading from Excel

File > Open > Data (1) in the main menu is option to open dialog (2). Dialog (2) needs from us to choose Excel type of files, folder where Excel file is, and concrete Excel file. After clicking "Open" in dialog (2) we will got dialog (3). There we choose one of the Sheets in the workbook and range of our data. If we don't supply range, automatically determined range will be used ( "A1:G44" on image ). After this our data will be loaded and we can see it in "Data View" (4).

Loading from Database

For getting data out of database, "IBM Data Access Pack" can be installed. This is IBM collection of drivers for different databases we can use. We don't have to use IBM drivers, but they will probably work the best, if we want to transfer data to SPSS. We start loading process by clicking on File > Open Database > New Query (1). Then we click button "Add ODBC Data Source" (2). In new dialog, in "User DSN" tab we should click on "Add" button (3).

"IBM Data Access Pack" will add many ODBC drivers whose names start with "IBM SPSS OEM" (1). We will choose "SQL Server Native Wire Protocol". In next screen we'll add credentials for our database (2). Now, we will close everything until we get back to our start screen (3), so we can click "Next" button.

Now we can select some table and its columns (1 => 2) and click on Finish. Those columns will be now presented into Data View (3).

Instead of Finish we can also use Next buttons to follow whole graphical wizard. This wizard will provide us with opportunity to define relations between tables, to filter data and to rename columns. This is all great, but the last screen is where we will be able to see and directly change SQL statement. I find it easiest to make changes here. After this step we have to click on Finish button, wizard will exit, and we will see our data loaded into SPSS program.