Descriptive statistics

Descriptive statistics is a branch of statistics that is describing the characteristics of a sample or population. Descriptive statistics is based on brief informational coefficients that summarize a given data set. In descriptive statistics we are mostly interested in next characteristics:

  • Frequency Distribution or Dispersion, refers to the frequency of each value.
  • Measures of Central Tendency, they represent mean values.
  • Measures of Variability, they show how spread out the values are.

Frequency Distribution

Frequency distribution shows how many observations belong to different outcomes in a data set. Each outcome can be described by group name for nominal data or ordinal data, interval for ordinal data or range for quantitative data. Each outcome is mutually exclusive class. We just count how many statistical units belong to each class.

Frequency distribution is usually presented in a table or a chart. There are four kinds of dispersion tables. For each kind of table there is a convenient chart presentation:

– Ungrouped frequency distributions tables show number of units per category. Their counterpart is Bar Chart.
– Grouped frequency distributions tables present number of units per range. Their companion is Histogram.
– Relative frequency distributions tables show relative structure. Their supplement is Pie Chart.
– Cumulative frequency distributions tables are presenting accumulation levels. Their double is Clustered Column chart.

Measures of Central Tendency

Measures of central tendency represent data set as a value that is in the center of all other values. This central value represents the value that is the most similar to all other values and is the best suited to describe all other values through one number. There are three measures of central tendency, Average, Median and Mode. In normal distribution these three values would be the same. If we don't have symmetry, then Median would be closer to extreme values then the Average, and Mode be at the top of distribution.

Average

Average is calculated by summing all the values, and then dividing the result with number of values ( x̄ = Σ xi / n ).

Median

The median is calculated by sorting a set of data and then picking the value that is in the center of that array. Let's say that all values in array are indexed from 1 to n [ x(1), x(2)…x(n-1), x(n) ].

If number of values in array is odd, median is decided by index (n+1)/2. Median is then decided like = x(n+1)/2, like in array [ 9(1), 8(2), 7(3), 3(7+1)/2, 1(5), -5(6), -12(7) ]. There are an equal number of values before and after median in our array, 3 values before the median and three values after the median.

If number of values is even, formula is x̃ = (x(n/2)+x(n/2)+1)/2, like in [ -3(1), 1(2), 0(3), 2(8/2), 3(8/2)+1, 4(6), 6(7), 9(8) ], so we calculate an average of two middle numbers (2+3)/2 = 2.5. Again, there are an equal number of values in our array before and after the two central values. As we can see, it is not important whether numbers are arranged in ascending or descending order.

Mode

The mode is the most frequent value in a sample or population. One data set can have multiple modes. In sample ( 1, 3, 2, 3, 3, 5, 4, 4, 8, 4 ) we have two modes. Both the numbers 3 and 4 appear three times. If we create Ungrouped Frequency Distribution table, we can easily notice our modes.

Measures of Variability

Measures of Variability shows how spread out the statistical units are. Those measures can give us a sense of how different the individual values are from each other and from their mean. Measures of Variability are Range, Percentile, Quartile Deviation, Mean Absolute Deviation, Standard Deviation, Variance, Coefficient of Variation, Skewness and Kurtosis.

Range

Range is a difference between maximal and minimal value. If we have a sample 2, 3, 7, 8, 11, 13,  where  maximal and minimal values are 13 and 2, then the range is: range = xmax xmin = 132 = 11.

Percentile

Let's say that we have sample with 100 values ordered in ascending order ( x(1), x(2)…x(99), x(100) ). If some number Z is bigger than M% values from that sample, than we can say that Z is "M percentile". For example, Z could be larger than 32% of values ( x(1), x(2)… x(32), x(33), x(99), x(100) ). In this case x(33) is "32 percentile".

For this sample [ -12(1), -5(2), 1(3)3(4), 7(5), 8(6), 9(7) ], number 7 is larger than 4 values, so seven is "57 percentile". This is because 4 of 7 numbers are smaller than 7, and 4/7 = 0,57 = 57%. Percentile show us how big part of sample is smaller than some value.

Be aware that there are several algorithms how to calculate percentiles, but they all follow similar logic.

Percentiles "25 percentile", "50 percentile", "75 percentile" are the most used percentiles and they have special names. They are respectively called "first quartile", "second quartile" and "third quartile", and they are labeled with Q1, Q2, Q3. Quartile Q2 is the same as median.

Quartiles, together with maximal and minimal values divide our data set in 4 quarters.
xmin    [25% of values]     Q1    [25% of values]    Q2     [25% of values]    Q3    [25% of values]    xmax

Quartile Deviation

The difference of third and first quartile is called "interquartile range": QR = Q3 – Q1. When we divide interquartile range with two, we get quartile deviation. Quartile deviation is an average distance of Q3 and Q1 from the Median.

Average of ranges A and B is quartile deviation, calculated as: QD = (Q3 – Q1 ) / 2.

Mean Absolute Deviation

Mean absolute deviation (MAD) is an average absolute distance between each data value and the sample mean. Some distances are negative and some are positive. Their sum is always zero. This is a direct consequence of how we calculate the sample mean.

x̄ = Σ xi / n
n * x̄ = Σ xi
n * x̄ – ( n * x̄ ) = Σ xiΣ x̄
0 = Σ ( xi – x̄ )
We can see, on the left, that formula, used for calculation
of a mean, can be transformed to show that sum
of all distances between values and the mean is equal
to zero. This is why, for calculating mean absolute
deviation (MAD) we are using absolute values.
If our sample is [ 10, 12, 14, 24, 32, 33, 36 ], mean value is 23. Sum of all distances is ( -13 – 11 – 9 + 1 + 9 + 10 +13 ) = 0. Instead of original deviations we are now going to use their absolute values. So, sum of all absolute deviations is ( 13 + 11 + 9 + 1 + 9 + 10 + 13 ) = 66. This means that MAD = 66 / 7 = 9,43.

Standard Deviation and Variance

The standard deviation is similar to the mean absolute deviation. SD also calculates the average of the distance between the point values and the sample mean, but uses a different calculation. To eliminate negative distances, SD uses the square of each distance. To compensate for this increase in deviation, the calculation will ultimately take the square root of the average squared distance. This is the formula used for calculating standard deviation of the sample:

Variance is just standard variation without root =>

Standard deviation is always same or bigger than mean absolute deviation. If we add some extreme values to our sample, then standard deviation will rise much more than mean absolute deviation.

Coefficient of Variation

Coefficient of variation is relative standard deviation. It is a ratio between standard deviation and the mean. The formula is CV = s / x̄. This is basically standard deviation measured in units of the mean. Because it is relative measure, it can be expressed in percentages.

Let's imagine that the standard deviation of a giraffe's height is equal to 200 cm. Standard deviation of a mouse's height could be 5 cm. Does that mean that variability of giraffe's height is much bigger than variability of mouse's height? Of course it is not. We have to take into account that giraffes are much bigger animals than mice. That is why we use coefficient of variation.

If we scale the images of mice and giraffes to the same height, we can see that the relative standard deviation of their heights is not as different as the ordinary standard deviation would indicate.

Kurtosis and Skewness

Kurtosis and Skewness are used to describe how much our distribution fits into normal distribution. If Kurtosis and Skewness are zero, or close to zero, then we have normal distribution.

Skewness is a measure of the asymmetry of a distribution. Sometimes, the normal distribution tends to tilt more on one side. If skewness is positive then distribution is tilted to right side, and if it is negative it is tilted to left side.

If skewness is absolutely higher than 1, then we have high asymmetry. If it is between -0,5 and 0,5, then we have fairly symmetrical distribution. All other values mean that it is moderately skewed.

To the right we can see how to calculate Skewness statistics.

The Kurtosis computes the flatness of our curve. Distribution is flat when data is equally distributed. If data is grouped around one value, then our distribution has a peak. Such humped distributions mean that kurtosis statistics is positive. That is Leptokurtic distribution. Negative values of kurtosis would mean that distribution is Platykurtic. The distribution is then more flat. Critical values for kurtosis statistics are the same as for skewness.

To the right we can see how to calculate Kurtosis statistics.

Return several values with one measure

Requirement

We have a table like (1). This tables shows how many units of product has been sold on each date in each region. We want to get pivot table (2) that shows how many days, sale was bigger than 75, 53 or 27 units. We want to observe those values per year so we added a slicer for years (3).

Needed calculations

If our data is (1), first we need to group our data by Region and OrderDate and to calculate sum of units for each group (2). That table we have to filter for values bigger than 75 (3). Then, again, we have to do grouping by Region and to count rows from our filtered table (4). We also have to repeat this for values 53 and 27.

Slow solution

For this calculation we could create three measures that look like the one bellow. If we have a lot of data, after clicking on slicer we would have to wait a lot for calculation to finish. Not only we do have a lot of data, but we also have demanding calculation for each measure, and we have three such measures.

MeasureOver75:=
VAR GroupByRegionOrderDate = GROUPBY( SampleTable; [Region]; [OrderDate]; "SumOfUnits"; SUMX( CURRENTGROUP(); [Units] ) )
VAR FilterOver75AndCount = COUNTROWS( FILTER( GroupByRegionOrderDate; [SumOfUnits] >= 75 ) )
RETURN FilterOver75AndCount

All three measures starts with the same step ( "GroupByRegionOrderDate" ). This step is very expensive because it is the one doing most of calculations. This step returns table. If we could just do this step once instead of repeating it for each measure, that would be great. We could make a basic measure that returns such table (1), and then three other measures that would use that table to calculate final scalar values (2).

Unfortunately, measures can not return table. They can only return scalar.

Better solution

Better solution is to create one measure that returns all three values. We can simply concatenate three numbers in one string ( "25-32-36" ). That would be result of our Basic measure. Then we can create three measures that would extract separate values. This could be our Basic measure code:

BasicMeasureString:=
VAR GroupByRegionOrderDate = GROUPBY( SampleTable; [Region]; [OrderDate]; "SumOfUnits"; SUMX( CURRENTGROUP(); [Units] ) )
VAR BiggerThen75 = COUNTROWS( FILTER( GroupByRegionOrderDate; [SumOfUnits] >= 75 ) )
VAR BiggerThen53 = COUNTROWS( FILTER( GroupByRegionOrderDate; [SumOfUnits] >= 53 ) )
VAR BiggerThen27 = COUNTROWS( FILTER( GroupByRegionOrderDate; [SumOfUnits] >= 27 ) )
VAR Result = BiggerThen75 & "-" & BiggerThen53 & "-" & BiggerThen27
RETURN Result           

It would return string „25-32-36". Then we can use three additional measures to extract three separate values:

FirstPartOfString:=VALUE( LEFT( [BasicMeasureString]; 2 ) )             //returns 25
MiddlePartOfString:=VALUE( MID( [BasicMeasureString]; 4; 2 ) )    //returns 32
LastPartOfString:=VALUE( MID( [BasicMeasureString]; 7; 2) )          //returns 36

As we see, we only do step „GroupByRegionOrderDate" once. This way we make our calculation much faster. If our final results are numbers, we could get results even faster by using mathematical calculations to extract our values, instead of doing string manipulations. In our Basic measure we could combine three values into one value like this:

VAR Result = BiggerThen75 * 10000 + BiggerThen53 * 100 + BiggerThen27  //returns 253236

Then, we can extract separate numbers by using INT and MOD functions like this:

Dates with sale over75 := INT( [BasicMeasure] / 10000 ) //returns 25
Dates with sale over53 := INT( MOD( [BasicMeasure]; 10000 ) / 100 ) //returns 32
Dates with sale over27 := MOD( [BasicMeasure]; 100 ) //returns 36

Solution with math calculations would be significantly faster.

Sample file can be downloaded here:

Sort Your Pivot Table by Using Slicer

Sometimes different users want pivot table sorted in different ways. If column has full names of people, it is possible to sort column either by using first names or by using last names. An additional requirement is the ability to change the sort order using a slicer.

In the example bellow, we can see that our pivot table can be sorted by "Points" (1), but two other possible sorting orders are by "Last Name" (2) or by "First Name" (3). We will create solution only for this three specific sorting methods. Points will be sorted from highest to lowest, and first and last names will be sorted alphabetically.

Helper Tables

First we have to create table in Power Pivot that will hold values for our slicer (1). We will call this table "SlicerValues".
Second, we have to create table in Power Pivot, that will have columns with sort order for every row (1). We need two such columns, one for last names and one for first names. This table will have name "SortOrdersTable".

General Idea

We know that it is possible to sort some column, in pivot table, by using some other column as sort order. First we have to right click on column which we want to sort, and then we go to Sort > More Sort Options (1). New dialog will open. In this dialog we will select "Descending (A to Z) by:" and then we will choose which column will define our sort order. In the image bellow, we decided that "Sum of Points" column (2) in DESC order is used for sorting. Now, "Full Name" column (3) is not sorted alphabetically, but according to "Sum of Points" column.

We can use mechanism, explained above, to control column "Full Name" sorting order. Idea is to create a measure, with a name "SortMeasure", that will be added to our pivot table (1). This measure values will change according to selection in the slicer (3,4,5). Our column "Full Name" will be set to sort by order defined by "SortMeasure" (2).

Now, when the user selects "First Names" (3), "Last Names" (4) or "Points" (5) in the slicer, column "SortMeasure" values will change (1) and that will sort our column "Full Name" (2). We can see in (3) that column "Full Name" is sorted by first name, in (4) by last name, and in (5) by points. At the end, column "SortMeasure" will be hidden from the user by hiding the whole spreadsheet column.

Creative process

First we have to make relation between "Full Name" columns in "SortOrdersTable" and "Data" table (1). "Data" table is the name of a table which is the base for our pivot (with "Full Name" and "Points" columns). Relation has to be "1:N" in direction "Data" to "SortOrdersTable" (2). This means that in relation dialog "Data" table has to be at the bottom. "Data" table can filter "SortOrdersTable".

Next step is to create our measure. This measure has to be reactive to slicer selection. ALLSELECTED function will return values selected in slicer as a column. With green part of code we will check whether specific individual values in slicer are part of that column. If they are, that means that those terms are selected and we will return TRUE(), otherwise we will return FALSE(). At the end, we will use SWITCH function to decide what values the measure should return. SWITCH function will return values for the first variable ( FirstNamesSelected, LastNamesSelected, PointSelected ) that returns TRUE(). For example, if FirstNamesSelected is FALSE(), and LastNamesSelected is TRUE(), values from "Last Name order" column will be returned. Those are the values that will be used to sort "Full Name" column.

SortMeasure:=
VAR FirstNamesSelected = IF( "First Names" IN ALLSELECTED( SlicerValues[SortBy] ); TRUE(); FALSE() )
VAR LastNamesSelected = IF( "Last Names" IN ALLSELECTED( SlicerValues[SortBy] ); TRUE(); FALSE() )
VAR PointsSelected = IF( "Points" IN ALLSELECTED( SlicerValues[SortBy] ); TRUE(); FALSE() )
VAR Result = SWITCH( TRUE(); FirstNamesSelected; SUM( SortOrdersTable[First Name order] ) 
                           ; LastNamesSelected; SUM( SortOrdersTable[Last Name order] )
                           ; PointsSelected; 10000 - SUM( Data[Points] ))
RETURN Result

All columns with possible result ( "First Name order", "Last Name order", "Points" ) are just wrapped in SUM function, but "Points" column has one more detail. In our measure we will transform all "Points" values into "10000 – Points". This is because, we want ascending sorting for "First Name order" and "Last Name order", but descending sorting for "Points". Because pivot table can only be in ascending or descending order at once, we need this transformation so that ascending ordering of our measure will always correctly sort our pivot table. Values of "Points" and "10000 – Points" are inversely correlated, and this solves our problem.

Descending sorting of "Points" is the same as ascending sorting of a transformation "10000 – Points".

Result

Bellow we can see our results and we can observe a few details. Columns "C" and "G" are hidden (1). This is how we will hide our "SortMeasure" column from the user. In green pivot table (2) we can see that while "SortMeasure" column is in ascending order, important column "Sum of Points" is actually in descending order, as per requirement.

In blue and orange pivot tables, there are two values selected in each slicer (3). It is not possible to force single selection in slicer in Excel without resorting to VBA. To solve this problem, we made our measure so that the first selected value is the one determining sorting order. That is why both blue and orange pivot tables are sorted by "First Names", that is the first value selected in their slicers.

Sample table can be downloaded from here:

Analysis Toolpak and Descriptive Statistics

Analysis ToolPak is an Excel add-in that provides many statistical tools. This add-in comes in two flavors, and each of them has to be enabled in "Add-ins" dialog (1) in order to use it. They both have the same functionality but "Analysis ToolPak" has its functionality available only through graphical interface. "Analysis ToolPak – VBA" is accessible both through graphical interface and VBA code.

After we enable "Analysis ToolPak" (1), a button will appear in Data > Analysis (2). By clicking on that button, we will open dialog with many statistical tools (3). In order to use any of these tools we just have to select it and click on OK. New dialog will appear which provides options for that specific statistical tool.

Descriptive Statistics

Descriptive statistics is based on a brief descriptive coefficients that quantitatively describe or summarize features of a data set. Data set is set of results of measurements made on a population or a sample. This coefficients comprise measures of central tendency like mean, median, mode, and measures of variability like Standard error, Variance, Range, Kurtosis.

We will select "Descriptive statistics" tool in Analysis ToolPak. "Descriptive statistics" dialog asks us to enter what range will be used as a source of data (1). This range, in our example, has two columns and we will get results for each of them. Program doesn't know whether data in selected range is organized in Columns or Rows. We will choose Columns (2) in our case, to solve that dilemma. Our data has labels in the first row, so we have to check option (3) in order to have first row excluded from calculations.

Final output can be directed to a range in the same sheet or some other sheet in the same workbook (A). This range is defined by it's top left cell. Option (B) means that new sheet will be created with provided name and result will be pasted in "A1" cell in that new sheet. Selection of option (C) will create new workbook with the name "Book1", and the result will be pasted in sheet "Sheet1" in cell "A1" in that workbook.

Last options (4,5,6,7) are for selecting what results should be calculated. Option (4) "Summary statistics" should always be selected because without it most of important indicators (mean, variance…) will not be part of a final result. Option (5) needs confidence level for calculating confidence interval for the mean. Without it checked, we will not get this result. Options (6,7) will give us the same results as SMALL and LARGE functions in Excel.

Descriptive Statistics Results

This is what the results of descriptive statistics would look like. We have results both for Col1 and Col2. For each column there is a list of different indicators which describe our data set. We will only analyze results for the first column.

To the right, we can see all the results for first column, with the same results calculated by using Excel formulas. We can see that all the results are the same but there is one potential difference. If there are multiple values in data set that could be considered as Mode ( e.g. 1,1,2,5,5 ), Analysis ToolPak would return the last one ( 5,5 ), but the MODE function would return the first one ( 1,1 ).

Formulas used in Analysis ToolPak descriptive statistics

Bellow are listed formulas used for calculation of some descriptive indicators.

Standard DeviationVariance
Standard ErrorConfidence Interval
KurtosisSkewness

Let's calculate Confidence Interval by using its formula. Because our sample is smaller than 30 observations, we will use t distribution. For 95% confidence level, with 3 degrees of freedom, t value would be 3.1824. That means that t * s = 3.1824 * 0,75 = 2.3868. This is the same result as Excel CONFIDENCE.T function is returning. Now our confidence interval is ( 2,75 – 2,3868; 2,75 + 2,3868 ), which is ( 0,3632; 5,1368 ).

Kurtosis and Skewness are used to describe how much our distribution fits into normal distribution. If Kurtosis and Skewness are zero, or close to zero, then we have normal distribution.

Analysis ToolPak Descriptive Statistics through VBA

Descr Subprocedure

Different functionalities of Analysis ToolPak can be called through VBA subprocedures. Bellow we can see how to provide arguments and call procedure with the name "Descr" for descriptive statistics. All arguments except the first one are optional. All arguments are of a type Variant.

Sub subDescrFunction()
Dim inputRng As Range: Set inputRng = Range("Sheet1!A3:B7")
Dim outputRng As Range: Set outputRng = Range("Sheet1!K3")
Application.Run "Descr", inputRng, outputRng, "C", True, True, 1, 1, 95
End Sub
inprng This is input range, with our data set.
outrngThis is output range. Output range is declared with a reference to a top left cell of the output range. If we type some word here, Analysis Toolpak will export result into new worksheet with such name. If this argument is False, blank or omitted, new workbook will be created.
groupedIs our data set organized in rows or columns? Depending on that, we type "R" or "C". "C" is default.
labelsHere we should type "True" if our data set has header. If False or omitted, then we claim that there are no labels.
summaryType "True" to get all major descriptive statistics. If this argument is False, blank or omitted, those values will not be calculated.
ds_largeWhat k-th largest value to get? If this argument is omitted, this value will not be calculated.
ds_smallWhat k-th smallest value to get? If this argument is omitted, this value will not be calculated.
confidWhat is confidence level for calculating confidence interval for the mean? If this argument is omitted, there will be no result.

VBA project for Analysis ToolPak (ATPVBAEN.XLAM) is protected with the password "Wildebeest!!". There you can find declarations of all VBA procedures.

DescrQ Subprocedure

There are two variants of subprocedures for descriptive statistics. First has name Descr, and the second one has name DescrQ. "Descr" would create result immediately by using arguments provided in the code. "DescrQ" would first open GUI dialog filled with values provided in the code (1). We can then accept those values, or we can change them, and then we can click on OK button. Only then, final results will be created (2). "DescrQ" give us opportunity to customize values provided in the code.

All the code for "Descr" and "DescrQ" is the same. Only difference is the name of subprocedure.

Application.Run "DescrQ", inputRng, outputRng, "C", False, True, 1, 1, 95
"DescrQ" version has one bug. Ranges in the dialog will have starting "grave accent" missing.
[New Microsoft Excel Worksheet.xlsm]Sheet1'!$D$9:$D$12 'initial formatting
'[New Microsoft Excel Worksheet.xlsm]Sheet1'!$D$9:$D$12 'corrected starting sign

Sample file can be downloaded here:

Power Query and protected sheet in Excel

We made a report. We'll give that report to some colleague. He will refresh that report periodically and he will then send it to end users. In the image bellow we can see that our colleague has to prepare 4 Access files as data source, and to open our report (1). He will than refresh it (2). At the end of refreshing MsgBox (3) will appear. After this, our colleague can save the file and he can then send it to other colleagues (4). Idea is to make this process as much easier as possible for our colleague.

One more requirement is that spreadsheet with tables has to be protected so that users can not accidentally change or delete some numbers. When we try to make report, as described, we will find few problems to do so:
1) Tables on protected sheet will not be able to refresh.
2) Combining Power Query and VBA asks for queries to be synchronous, which will prolong queries execution.
3) If we can not use VBA, how to create MsgBox (3)?
Let's see how to solve this problems.

Tables on Protected Sheet Will Not Refresh

When we try to "Refresh All" tables that are located in protected sheet, this is the message we will get. We will have to remove our protection before refreshing our data, and then to protect the sheet again after. Idea is to avoid such step and to automate things.

We could try to solve this problem by using VBA. Usually we would lift the protection, do the changes, and then set protection back. We can use VBA code similar to this bellow:

Sub RefreshTables()
ThisWorkbook.Worksheets("Queries").Unprotect Password:="fff"
ThisWorkbook.RefreshAll
ThisWorkbook.Worksheets("Queries").Protect Password:="fff"
MsgBox "All Tables Refreshed"
End Sub

Problem is that Power Query queries are executed asynchronously. Queries will still be executing (2), VBA will not wait for them to finish, but will show final MsgBox immediately (1). This means that protection on the sheet is already placed. After we click on OK button (1), we will again receive message (3) because of that.

As explained in this post, all we need to do is to make queries synchronous. This is done by unchecking option "Enable background refresh" in query properties.
Unfortunately this would make Refreshing lasting much longer. Instead of executing all of the queries at the same time (1), queries would be executed sequentially, one by one (2). This makes refreshing lasting two times longer, on my computer.
We can try to protect our cells by making some data validation rule that will never be fulfilled. This would prevent users from overwriting numbers in our cells, but it will not stop them from deleting values in our cells. Data validation is only triggered when cell is in Edit mode and the user is trying to enter some data into cell. Edit mode is not needed for deleting cell content, we just select the cell and press Delete key.

Solution for protected sheet

Best solution is to place refreshable tables in one sheet that is unprotected and hidden (2). Numbers from that sheet can be mirrored (3), by using formulas, in another sheet which is protected (4). Cells in protected sheet always have the same, unchanged formulas (5), so they will never trigger sheet protection.

User can now click on Refresh All button in Data tab (1). Tables in unprotected sheets would refresh, and tables in protected sheet would just reflect those results.

MsgBox without VBA

We said that synchronised queries would take longer to execute, so we will avoid VBA. Question is how to create MsgBox at the end of refreshing that would inform user that refreshing is over. The only way to create message in Power Query is by using "error" statement. We have to make sure that "error" statement happens at the end of refreshing when all individual queries are finished.

We can create a query like the one bellow. This query is referencing all other queries. This means that the query bellow has to be executed the last. At the end of refreshing, this query will show user the message that is written in Error.Record. This query has to be loaded into spreadsheet in order to work correctly.

let   
  Result = if { FactSales1, FactSales2, FactSales3, FactSales4 } = { "A","A","A","A" } 
    then "A" else error Error.Record( "Finished", "Refreshing is over." )
in
    Result

We can see bellow that the final query is dependent on all other queries (1). This MsgBox query will never load any data into spreadsheet, so its footprint will be like (2). This query will show us a message that looks like (3) after all other queries are refreshed.

Sample files can be downloaded from here. File "ContosoSales1.accdb" should be copied three more times to create files "ContosoSales2.accdb", "ContosoSales3.accdb", "ContosoSales4.accdb". In "Parameters" Excel sheet, change fullpaths toward those MS Access files.