Experience is a great teacher. And in my experience, the beginning has always been tough.

As a BI consultant & trainer, I have worked with various people: from the frontline employees to CEOs. Our interactions gave me great insights into the multiple challenges they face.

I have put together a few pages from my experience and learning; covering Power BI, Excel, Power Query, Power Pivot, Tableau, data analytics & data visualization.

The purpose of the blog is to get you started.

I hope you’ll find them helpful.


Statistics Simplified is the series to express statistics in layman terms.

Photo by RF._.studio from Pexels

Got data points with special causes or outliers in your data set?

How to calculate the average in such cases?

A few common approaches are to either:

  • exclude the record, or

  • replace the extreme values with the median of the data set.

And there is another way: we use Trim Mean instead of a simple average.

What is Trim Mean?

An average of the trimmed or “inner” data set.

Which data points does it trim?

It excludes the data points from both ends.


Excel has function TRIMMEAN which takes two arguments:

  • Array: The data range

  • Percent: The percentage of data points to be excluded from the calculation. It takes values from 0 to 1. If we say percent = 0.2, then it will exclude 20% of the dataset from the calculation. So, if the data set contains 10 data points, then it will exclude 2 data points; 1 from the top and 1 from the bottom (as demonstrated in the image above).

Just like the median, it first sorts the data into an order and then excludes the extreme data points from both ends.

In the example below, TRIMMEAN excluded 2 & 100 from the calculation, as I have supplied 0.2 as a percent.

Points to Remember

  • TRIMMEAN function in Excel takes values greater than 0 and less than 1 in <percent>. Both 0 & 1 returns #NUM! error.

  • TRIMMEAN excludes datapoints in the nearest multiples of 2. If the <percent> is .3 for the data set size of 10, it will remove 2 data points (one from the minimum and one from the maximum side).

Also, see:

Mean vs. Median

Geometric Mean


Statistics Simplified is the series to express statistics in layman terms.

When it comes to understanding data, we prefer representing an entire data set using aggregations like count, sum, average, percentages. These aggregations summarize multiple data points into single points. These individual points typically represent the entire datasets, which makes comparisons and decision-making a more straightforward process.

Arguably, the average is the most popular aggregation when it comes to comparisons.

Reason: Average, or also known as the arithmetic mean, is easy to calculate.

How to calculate the average?

We sum the entire data set and then divide it by the count.

Let us take the following sample dataset: 1, 2, 3, 4, 5

  • Step 1: Calculate sum of all the numbers (1+2+3+4+5) = 15

  • Step 2: Count all the numbers (1,2,3,4,5) = 5

  • Step 3: Divide the output of Step 1 by Step 2 (15/5 = 3)

So, in simple words, we can say that the central data point of this dataset is 3. Or, most of the data points are around 3.

The fact that calculating average is an effortless process, and it is a representation of the entire dataset, makes average the most widely used aggregation.

Even Excel has two formula-less ways of calculating average:

Status Bar

Pivot table

More on Value Field Settings

So what is a Median?

Median is also a representation of the central tendency of the data point. Unlike average, the median is not a calculated value. It is a physical point in the dataset.

Calculating the Median

We sort the data in an order (ascending or descending), and then the middlemost value becomes the median.

Case 1:

When the count of the data set is an odd number

Sample Data: 8, 6, 4, 10, 12

  • Step 1: Arrange the data in an order -> 4, 6, 8, 10, 12

  • Step 2: Find the middle number -> 4, 6, 8, 10, 12

Median = 8

Case 2:

When the count of data set is an even number

Sample Data: 8, 6, 4, 10, 12,2

  • Step 1: Arrange the data in an order -> 2,4, 6, 8, 10, 12

  • Step 2: Calculate the average of the middle two numbers: 2,4, 6, 8, 10, 12

Median = (6+8)/2 = 7

Formulas in Excel

There are two simple formulas for average and median:



Advantage Median?

Vinci is a data analyst and lives in City A. One of his friends told him that data analysts in City B are getting higher salaries and recommends him to move to City B.

Vinci decided to analyze the wages for data analysts in two cities, A & B. He collected sample data for the two cities and calculated the average.

City A: $ 121,012

City B: $ 258,713

By just looking at this, City B appears to be a prospective location for business analysts as the average salary of City B is 114% higher than City A.

Then, he calculated median salaries for these cities:

City A: $ 122,082

City B: $ 121,511

The median salary of City B is, in fact, slightly lower than City A.

How is this possible? Why are the two characteristics of the central tendency of data telling two different stories?

To find out, Vinci decided to investigate the samples he had used for the analysis.

The salary for one of the samples is significantly higher than the rest of the group, resulting in changing the average value of the group salary.

Extreme values in the data set impacts the average. In such cases, the average can be a misleading representation of the dataset.

However, the median remains unaffected by such data points.

If we exclude Jen’s salary from Sample B and then the average wage of City B comes down to $ 119,260. Now City B does not seem to be lucrative enough, in terms of salary.

He should collect more samples to support his decision to move to City B.

In a nutshell, while comparing performances, we should not wholly rely on averages and include other aggregations. Otherwise, we may end up deciding wrong!

Also see: Trim Mean


Statistics Simplified is the series to express statistics in layman terms.

Any data set should be analyzed for its central tendency and variation. Why variation? What benefits will we get by looking at variation?

Let us consider this scenario: you have come across a river which can be crossed on foot as there is no bridge. You do not know swimming, and the current in the river is calm. There is a board at the river’s bank denoting the average depth as 3 feet.

You are 5.8 feet tall.

Will you cross the river?

In our day-to-day lives, we usually look at the average for performance comparison and decision-making.

It is a major flaw of our thought process as we ignore another critical aspect of data property: variation.

And we call such thought process “Flaw of Averages”.

Had there been additional details like maximum depth: 8 ft., would you have crossed the river?

Considering the variation in the data helps in the wiser decision.

What is the variation?

It is a measurement of the distance between the data points within a given data set.

Lower the better

Low variation implies:

  • Performance is efficient and better managed.

  • Less probability of outlier’s in the performance.

  • Better prediction of future values

Measures of Variation

Popular ways to measure variations are Standard Deviation, Inter-Quartile Range (IQR), and Range.

  • Range: Difference between maximum & minimum value.

  • Standard Deviation: Average distance of data points from each other.

  • Inter Quartile Range (IQR): Difference between 75th percentile and 25th percentile, where percentile is the position of data points when arranged in an order. The Median is the 50th percentile.

Also, see Central Tendency