The __previous webinar__ outlined how a pivot table in Excel can be useful for summarizing and analyzing data. There are certain limitations with the regular pivot in data modeling and building complex business logic. And hence, we require something more powerful to bind multiple data tables together from various data sources without losing the simplicity of pivot table “drag-drop” features.

It is quite surprising that not many Excel users know the feature that has been part of the regular Excel since Office 2013. So, I thought to create awareness around this simple yet powerful feature present in Excel with the weekly webinar series. Following is the first part of the series:

Download sample data file: SuperStore Sales

We briefly discussed the history and basic features of a PowerPivot and how it is different from a regular pivot table. This video also highlights the data modeling concepts supported with PowerPivot.

https://youtu.be/o7KoeupEHoI5 Useful Features of Excel Pivot Tables

Do More with Pivot Tables - Value Field Settings

With a simple dataset containing five related tables, this video walks you through the steps on how to add and build a data model in Excel. We learn how to enable the PowerPivot feature, add tables to the data model, and create relationships.

https://youtu.be/wCGZXxr5LNwData Modeling with Power Pivot - Getting Started

DAX is a powerful language introduced with the tabular model of SQL Server Analysis Services. We use DAX to build complex business logic in Power BI, SSAS Tabular models, and PowerPivot. This video shows how to add a measure in the PowerPivot.

We can add data from multiple sources in a PowerPivot data model using the Get Data feature in the Data ribbon. It is quite helpful, as we can connect with numerous sources without importing data to Excel. Since it is not mandatory to load the data table in Excel to analyze, we can analyze large datasets with billion rows without facing the challenges of reaching the rows limit of an Excel sheet and increasing the file size.

https://youtu.be/A1pVl0Z9euAAlso, see:

In the next webinar, we will talk about introducing DAX into data modeling.

Join the WhatsApp group ** BI Simplified** to ask questions, share best practices, and get notifications for future webinars.

]]>

Higher the standard deviation, the higher the __variation__.

After mean, the standard deviation is the most commonly used in various statistical tests. A few typical applications of standard deviation:

- Understanding the spread of data and its distribution
- Identifying special causes of variation in the dataset
- Standardizing the dataset (calculating Z value)

When combined with the mean, standard deviation plays a significant role in performing various statistical analyses and tests.

In Excel, we have two formulas for calculating Standard Deviation:

- STDEV.P(
)

- STDEV.S(
)

Where,

P stands for a population (contains complete data of the entire scope)

S stands for a sample (contains data of a segment of the entire scope)

Depending on the dataset we are dealing with (population or sample), we use the appropriate standard deviation formula.

Also, see: __Sampling__

Have you come across the following message while establishing a relationship between two tables and wondered what it means?

If yes, then read along.

This article is my effort to decode this in a simple possible manner.

Ambiguity means interpreting in more than one way. Or, in simple words, what is the appropriate choice? There is uncertainty.

In a data model, we use the relationship to propagate filters between two tables. Like any robust system, the Power BI data model requires a clearly defined path for the filter propagation.

Let us understand with examples.

The simplest example of an ambiguity in a data model is establishing more than one relationship between the two tables.

There can be one active relationship between two tables (denoted by a solid line). All the other relationships are always inactive (indicated by a dotted line)

In the above example, Orders[Order Date] and Orders[Ship Date] is related to ftCalendar[Date]. The relationship between Orders[Order Date] and ftCalendar[Date] is active, whereas the other is inactive.

I have added the following measure to the model:

`Order Qty = SUM(Orders[Order Quantity])`

When we use the above measure in a visual and ftCalendar[Year], it returns total orders by Order Date, grouped by year.

Power BI is propagating the filter through the active relationship between the two tables (Date to Order Date).

Imagine if both the relationships between the Order table and Calendar table is active.

Which Order Quantity would the above visual be returning? Ambiguous.

Power BI simply doesn’t allow us to create any ambiguous scenarios for the data model to avoid the above scenario.

Consider the following scenario:

Manager[City] is related to Target[City] (Status – Active)

Manager[City] is related to Customer[City] (Status – Active)

When we establish a relationship between Customer[Customer Segment] and Target[Customer Segment], Power BI does not allow it to be active.

It says an active relationship between Target and Customer table creates an ambiguous relationship.

For a moment, let’s imagine that the relationship between Customer and Target is active. In that scenario, there will be two paths available for the model to choose for the filter propagation when we apply a filter from the Manager’s table:

Power BI model avoids such ambiguous decisions for the filter propagation.

What is the alternative?

We can temporarily active an inactive relationship with the help of USERELATIONSHIP. USERELATIONSHIP temporarily activates the inactive relationship (defined in the formula) and deactivates others to remove ambiguity. For more details, please see:

__DAX: ____USERELATIONSHIP __

I hope it is making some sense now. Or is it still ambiguous?

]]>Pivots make the heavy lifting of summarizing and analyzing data simpler. It is easy to start with and has some useful features which simplify the slicing and dicing of data and view the performance from multiple perspectives. In this webinar, I covered a few essential aspects which can get you started with pivots in no time.

You can download the practice file from here: __Sample Data file__

The current demonstration is happening on iOS instead of Windows; hence you may find some of the features presented differently. However, the functionalities and the feature’s name are common among both the operating systems.

The webinar is in six parts for convenience.

Start analyzing a large data table in no time. Add interactivity to your table using slicers and calculated fields.

https://youtu.be/I19y4etGT_oWindows users may also refer to the following article for recommended pivot table settings:

__3 Pivot table settings you should use__

The default calculation of any numerical column in a pivot table is a sum, and for a text column, it is count. However, using the pivot table, we can calculate average, min, max, standard deviation. All of these metrics are part of descriptive statistics.

https://youtu.be/KJGQ6exXOLYRelated article:

__Statistics Simplified: Central Tendency__

__Statistics Simplified: Variation__

Adding layers to a table and allowing users to drill down and analyze performance from multiple perspectives is a useful pivot table feature.

https://youtu.be/1PXr4aRWGTIPercentage calculation is a vital aspect of any analysis. It helps in creating a baseline for comparing different metrics or periods. It is quite simple to calculate percentages in a pivot table using value filed settings

https://youtu.be/uO6ZPTWYvcgRelated article: ,__Do More with Pivot Tables - Value Field Settings__

Create additional categories using numerical fields with the grouping feature in the pivot table. These groups add another dimension to understand performance.

https://youtu.be/pkCWQ9zB7sAVisual analytics is an essential aspect of any analysis. Learn how to create an interactive dashboard-like experience using conditional formatting to generate heat maps, sparklines to depict trends, and more.

https://youtu.be/6wolvmiySmEAlso, see:

,__5 Useful Features of Excel Pivot Tables__

Pareto Chart in Excel (vivran.in)

In the next webinar, we will introduce Power Pivot, where we can use pivot tables with multiple tables and use relationships between the tables.

Join the WhatsApp group ** BI Simplified** to ask questions, share best practices, and get notifications for future webinars.

Migrating data sources at times can be tricky in Power BI. This article focuses on how to change the data source from an Excel table to a SQL table in minimum possible steps, considering the following scenarios:

- The name and structure of the data source tables are different in Excel and SQL. It includes a change of column names.
- Migrate and apply all the data transformation steps.
- The migration should not affect the existing data model and the visuals present in the report.

To achieve this, we require to ensure three points:

- Column names remain unchanged,
- Replicate all the data transformations under Applied Steps,
- Query Name remains unchanged.

For our example, we are changing the data source of the Product Subcategory table from Excel to SQL:

Excel table structure:

SQL table structure:

It ensures that we have the same column names:

In case you can't write the native query to change the column name, then add a step in Power Query to change the column names.

This step is the trickiest of the three but not difficult.

Select the query > Home > Advance Editor and copy steps, which include the data transformations. In this example, we have applied two transformations:

We can find these steps in the Advanced Editor under the same names:

Add a comma after the last line, and paste the copied code:

Update the reference of the previous step. In this case, the name of the last step is “Source”.

Delete the Excel query and assign the same name to the SQL query:

And this should do.

]]>Let us start with an example. Below is the year-over-year sales data, along with the %age change from the previous year:

What is the average %age change from the year 2014 to 2019?

If we calculate the arithmetic mean (aka average) of the % Change, we get 9.6%. It means, each year, there is a 9.6% change from the previous year.

If we apply a 9.6% year-over-year change, we get a different result than 3682 for the year 2019.

There is a difference of 14 units or 0.39%.

It signifies that arithmetic mean or simple average is incorrect in such scenarios.

The geometric mean is applicable in such scenarios. The geometric mean of the above example is 9.5%.

When we apply 9.5% of change from the year 2014, we end up with 3,682 sales in 2019.

In Excel, we use the function GEOMEAN. Just like the function AVERAGE, it takes an array as an input. It requires one adjustment in the formula:

__GEOMEAN (1+ ) - 1__

Adding 1 to the range and then subtracting it offsets significant fluctuations and negative values.

Let us see one more example:

Here we see a significant difference between the average (arithmetic mean) and geometric mean (-3.2% vs. -9.9%).

Following is the difference when we apply this to validate the results:

The geometric mean is used in calculating the overall return% on stocks or investments over

time.

Also, see:

#vivran

]]>Photo by **RF._.studio** from **Pexels**

Got data points with special causes or outliers in your data set?

How to calculate the average in such cases?

A few common approaches are to either:

- exclude the record, or
- replace the extreme values with the median of the data set.

And there is another way: we use Trim Mean instead of a simple average.

An average of the trimmed or “inner” data set.

It excludes the data points from both ends.

Excel has function TRIMMEAN which takes two arguments:

**Array**: The data range**Percent**: The percentage of data points to be excluded from the calculation. It takes values from 0 to 1. If we say percent = 0.2, then it will exclude 20% of the dataset from the calculation. So, if the data set contains 10 data points, then it will exclude 2 data points; 1 from the top and 1 from the bottom (as demonstrated in the image above).

Just like the median, it first sorts the data into an order and then excludes the extreme data points from both ends.

In the example below, TRIMMEAN excluded 2 & 100 from the calculation, as I have supplied 0.2 as a percent.

- TRIMMEAN function in Excel takes values greater than 0 and less than 1 in
. Both 0 & 1 returns #NUM! error.

- TRIMMEAN excludes datapoints in the nearest multiples of 2. If the
is .3 for the data set size of 10, it will remove 2 data points (one from the minimum and one from the maximum side).

Also, see:

]]>When it comes to understanding data, we prefer representing an entire data set using aggregations like count, sum, average, percentages. These aggregations summarize multiple data points into single points. These individual points typically represent the entire datasets, which makes comparisons and decision-making a more straightforward process.

Arguably, the average is the most popular aggregation when it comes to comparisons.

Reason: Average, or also known as the arithmetic mean, is easy to calculate.

We sum the entire data set and then divide it by the count.

Let us take the following sample dataset: 1, 2, 3, 4, 5

- Step 1: Calculate sum of all the numbers (1+2+3+4+5) = 15
- Step 2: Count all the numbers (1,2,3,4,5) = 5
- Step 3: Divide the output of Step 1 by Step 2 (15/5 = 3)

So, in simple words, we can say that the central data point of this dataset is 3. Or, most of the data points are around 3.

The fact that calculating average is an effortless process, and it is a representation of the entire dataset, makes average the most widely used aggregation.

Even Excel has two formula-less ways of calculating average:

More on __Value Field Settings__

Median is also a representation of the central tendency of the data point. Unlike average, the median is not a calculated value. It is a physical point in the dataset.

We sort the data in an order (ascending or descending), and then the middlemost value becomes the median.

**Case 1: **

When the count of the data set is an odd number

Sample Data: 8, 6, 4, 10, 12

- Step 1: Arrange the data in an order -> 4, 6, 8, 10, 12
- Step 2: Find the middle number -> 4, 6,
**8**, 10, 12

Median = 8

**Case 2: **

When the count of data set is an even number

Sample Data: 8, 6, 4, 10, 12,2

- Step 1: Arrange the data in an order -> 2,4, 6, 8, 10, 12
- Step 2: Calculate the average of the middle two numbers: 2,4,
**6, 8**, 10, 12

Median = (6+8)/2 = 7

There are two simple formulas for average and median:

AVERAGE(

MEDIAN(

Vinci is a data analyst and lives in City A. One of his friends told him that data analysts in City B are getting higher salaries and recommends him to move to City B.

Vinci decided to analyze the wages for data analysts in two cities, A & B. He collected sample data for the two cities and calculated the average.

City A: $ 121,012

City B: $ 258,713

By just looking at this, City B appears to be a prospective location for business analysts as the average salary of City B is 114% higher than City A.

Then, he calculated median salaries for these cities:

City A: $ 122,082

City B: $ 121,511

The median salary of City B is, in fact, slightly lower than City A.

How is this possible? Why are the two characteristics of the central tendency of data telling two different stories?

To find out, Vinci decided to investigate the samples he had used for the analysis.

The salary for one of the samples is significantly higher than the rest of the group, resulting in changing the average value of the group salary.

Extreme values in the data set impacts the average. In such cases, the average can be a misleading representation of the dataset.

However, the median remains unaffected by such data points.

If we exclude Jen’s salary from Sample B and then the average wage of City B comes down to $ 119,260. Now City B does not seem to be lucrative enough, in terms of salary.

He should collect more samples to support his decision to move to City B.

In a nutshell, while comparing performances, we should not wholly rely on averages and include other aggregations. Otherwise, we may end up deciding wrong!

Also see: Trim Mean

#vivran

]]>Any data set should be analyzed for its __central tendency__ and variation. Why variation? What benefits will we get by looking at variation?

Let us consider this scenario: you have come across a river which can be crossed on foot as there is no bridge. You do not know swimming, and the current in the river is calm. There is a board at the river’s bank denoting the average depth as 3 feet.

You are 5.8 feet tall.

Will you cross the river?

In our day-to-day lives, we usually look at the average for performance comparison and decision-making.

It is a major flaw of our thought process as we ignore another critical aspect of data property: ** variation**.

And we call such thought process “**Flaw of Averages**”.

Had there been additional details like maximum depth: 8 ft., would you have crossed the river?

Considering the variation in the data helps in the wiser decision.

It is a measurement of the distance between the data points within a given data set.

Low variation implies:

- Performance is efficient and better managed.
- Less probability of outlier’s in the performance.
- Better prediction of future values

Popular ways to measure variations are Standard Deviation, Inter-Quartile Range (IQR), and Range.

**Range**: Difference between maximum & minimum value.: Average distance of data points from each other.__Standard Deviation__**Inter Quartile Range (IQR)**: Difference between 75th percentile and 25th percentile, where percentile is the position of data points when arranged in an order. The Median is the 50th percentile.

Also, see __Central Tendency__

#vivran

]]>A single value that attempts to describe a set of data by identifying the central position, within that set of data.

One point in the data set which balances the entire data set.

Central Tendency is also known as Measure of Central Location or more accessible, average.

There are three measures of central tendency: Mean, Median, and Mode

The most popular measure of central tendency is Arithmetic Mean, which is also represented with the formula AVERAGE in Excel.

Depending on the data type, we use an appropriate measure of central tendency

Also see:

#vivran

]]>Identifying data types is crucial in data analytics. Wisdom says that we should know the data type before we start the data analysis process. And the reasons are apparent. If we understand the data type, then we can apply appropriate mathematical aggregations and statistical tests.

We categorize data into two primary categories: **Qualitative **& **Quantitative**

We can understand data types by the following example:

Discrete data types primarily contain count and percentages.

The fundamental difference between a continuous and a discrete data type is that continuous data type is always associated with a unit or a scale, e.g., kilogram, meter, centimeter, degree Celsius, years.

Each data type has its level of measurement:

And depending on the data type, we can decide on the underlying mathematical aggregations:

Also see: __Central Tendency __& __Variation__

__#vivran__

How do we decide if it is a perfectly cooked rice?

We randomly pick one grain of rice and check. Based on our findings on the single grain, we infer that the entire rice bowl is perfectly cooked or not.

Sampling is a process of understanding the behavior of the entire group by learning the behavior of a portion of the group.

The single grain of rice in the above example was a sample, and the process of picking the grain is known as Sampling.

The primary reason is that it is easy to collect data for a sample than the entire population.

Data collected using various sampling techniques are practical, economical, handy, and adaptable.

There are two popular sampling methods:

- Probability Sampling
- Non-Probability Sampling.

In probability sampling, every member of the population has a chance of getting selected as a sample. We use this technique when we want our sample to be a representation of the entire population. We use this for quantitative analysis.

In the non-probability sample, samples are selected based on specific criteria. Not every individual has a chance of being selected. This technique is applicable in research and qualitative analysis. It helps to get a basic understanding of a small group or population under a specific study.

This article explains the popular methods used in probability sampling.

We randomly pick samples from the entire population. Every member has an equal opportunity of getting selected.

For example, for conducting an employee-based survey, we randomly picked employees within an organization.

We randomly pick the first sample, and then after that, we choose every *nth *item in the data.

In the example below, we pick every *fourth *element from the sample after choosing the second item.

We arranged the employees by the employee ID for the same survey and randomly picked an employee (Emp ID 2). Then we select every 4th employee after that (Emp ID 6, 10, 14…).

Strata mean layers. We pick members for Sampling from each layer. For example, we have four groups, and we ensure we select at least one from each group.

This time for the survey, we arranged data based on employee’s designation (Associate, SME, TL, Manager…). Then we randomly pick employees from each designation group.

We divide the population into different clusters with similar characteristics. Then, we randomly pick the entire group.

For conducting the survey, we are picking all the employees within a department or team.

]]>I am a fan of data visualization. After a long time, the Power BI team has released a few significant updates related to standard visualizations available in Power BI Desktop. This article highlights three such features.

Stack charts help in displaying results by multiple dimensions in a visual.

The challenge had been with the visual that it could show the values by segment, but we can’t see the total

With this release, we can now display the total

It also adds the total in the tooltip:

Q&A visual is quite intuitive as we can ask questions in natural language, and it returns results. I have explained these features in detail in the following articles:

__Power BI Visuals: Q&A (July 2020 Update)__

In this update, we can perform an arithmetic operation (addition, subtraction, multiplication, and division) in the Q&A visual. For example, if I want to check what is 25% of my total revenue quickly, I can use the Q&A visual to get the answer:

In this case, Revenue is a measure.

We can perform arithmetic operations on measures as well:

It follows the BODMAS rule

Want to get the insights from a visual quickly?

It is a two-step process now: Right-click > Summarize

We can add this as a visual, and it provides a summary of all visuals present in the page.

It is the latest visual in the arsenal, and currently is a preview feature

Visit the link below for more details on Smart Narrative:

__Microsoft docs: Smart Narrative__

#vivran

]]>Pareto chart is a visual representation of the Pareto Rule.

Pareto Rule, which is also popularly known as 80/20 rule, was coined by Italian mathematician ,Vilfredo Pareto. He stated that 20% of the population owns 80% of the land.

Later, the quality guru Joseph Juran changed to 80% of the problems is due to 20% of the reasons.

In simple words, the Pareto rule is all about prioritizing the vital few from the trivial many.

This video shows the steps involved to create a Pareto chart in Excel.

https://youtu.be/DbPYRlh4bhA

Related articles:

__https://www.vivran.in/post/do-more-with-pivot-tables-value-field-settings__

In the __earlier article__, we learned how to calculate the moving average at a day level (30 days). It is since the date granularity is at a day level. This article explains how we can calculate the moving average at a month or a quarter level when the date granularity is at a day level.

We have included a calendar table in our data model:

And created a relationship with the Order table:

Unlike the earlier example, we cannot use the AVERAGEX formula due to the granularity. We use the mathematical formula for average (sum of numbers divided by count of numbers).

Let us say we are calculating the moving average of the past three months. For this, we sum the revenue for the previous three months and divide by the month count (3).

To calculate the count of the month, we use the formula DISTINCTCOUNT on the month column of the calendar table.

```
1. Moving Average _M =
2. VAR _CurrentDate =
3. MIN ( ftCalendar[Date] ) - 1
4. VAR _FilterDate =
5. DATESINPERIOD ( ftCalendar[Date], _CurrentDate, -3, MONTH )
6. VAR _Monthly =
7. CALCULATE ( [Revenue], _FilterDate )
8. VAR _MonthCount =
9. CALCULATE ( DISTINCTCOUNT ( ftCalendar[Month] ), _FilterDate )
10.VAR _Average =
11. DIVIDE ( _Monthly, _MonthCount )
12.RETURN
13. _Average
```

Similarly, to calculate the moving average of the last three quarters, we replace month components by the quarter

```
1. Moving Average _Q =
2. VAR _CurrentDate =
3. MIN ( ftCalendar[Date] ) - 1
4. VAR _FilterDate =
5. DATESINPERIOD ( ftCalendar[Date], _CurrentDate, -3, QUARTER )
6. VAR _Quarterly =
7. CALCULATE ( [Revenue], _FilterDate )
8. VAR _QuarterCount =
9. CALCULATE ( DISTINCTCOUNT ( ftCalendar[YearQtr] ), _FilterDate)
10. VAR _Average =
11. DIVIDE ( _Quarterly, _QuarterCount )
12.RETURN
13. _Average
```

Download __sample file__

CALCULATE is the strongest formula in DAX. The filter arguments of CALCULATE can override any existing filter on the same column. For example, the following measure will always return the revenue for the manager Anil irrespective of filters applied by the table visual on the Manager’s column:

`1. Revenue = SUMX(Orders,Orders[Unit Price] * Orders[Order Quantity])`

```
1. Revenue Anil =
2. CALCULATE(
3. [Revenue],
4. Customer[Manager] = "Anil"
5. )
```

DAX translates the above measure as following:

```
1. Revenue Anil =
2. CALCULATE(
3. [Revenue],
4. FILTER(
5. ALL(Customer[Manager]),
6. Customer[Manager] = "Anil"
7. )
8. )
```

It is essentially overriding any external filters applied by the visuals and using its filter (Manager = “Anil”).

If we want this measure to show revenue only when the Manager’s name is Anil in the filter context and blanks for the rest of the managers, then we wrap around the filter argument in the above measure in KEEPFILTERS.

```
1. Revenue Anil KEEPFILTER =
2. CALCULATE(
3. [Revenue],
4. KEEPFILTERS(Customer[Manager] = "Anil")
5. )
```

Result:

As the name suggests, KEEPFILTERS keeps the existing filter and adds the new filter to the context. It is combining the filters applied by the table visual, and then it further adds the filter of the Manager’s name equals to *Anil*.

As a result, it returns like an ** AND **condition where it returns a value when it meets both the criteria, otherwise returns blank.

The following measures will help in a better understanding of the KEEPFILTERS:

```
1. Revenue Anil-Rob KEEPFILTERS =
2. CALCULATE(
3. [Revenue],
4. KEEPFILTERS(Customer[Manager] in {"Anil", "Rob"})
5. )
```

Output:

It is returning results where both the filter conditions are matching (visual & measure).

It is different from the following measure:

```
1. Revenue Anil-Rob =
2. CALCULATE(
3. [Revenue],
4. Customer[Manager] in {"Anil", "Rob"}
5. )
```

The above measure is essentially translated to:

```
1. Revenue Anil-Rob =
2. CALCULATE(
3. [Revenue],
4. FILTER(
5. ALL(Customer[Manager]) ,
6. Customer[Manager] = "Anil" || Customer[Manager] = "Rob"
7. )
8. )
```

Let us consider creating a measure that returns the revenue of transactions where revenue is greater than 5000. As we do not have any column for revenue in our model, we will use the FILTER function to achieve the objective.

```
1. Revenue (Large Orders) =
2. CALCULATE (
3. [Revenue],
4. FILTER (
5. ALL(Orders[Unit Price],Orders[Order Quantity]),
6. [Revenue] > 5000
7. )
8. )
```

The above measure will filter the *Order* table where the Revenue (Unit Price * Order Quantity) is > 5000, and then return the revenue of the filtered table.

Following is the output:

So far, so good.

The challenge comes when we filter the above visual on Order Quantity:

The output for *Revenue (Large Orders) *measure remains unchanged, even if we have selected the Order Quantity between 1 & 10. The ALL statement in the measure is overriding any external filters used on the visual:

We can re-write the above measure, and we apply the filter on the entire fact table:

```
1. Revenue (Large Orders) 2 =
2. CALCULATE (
3. [Revenue],
4. FILTER (
5. Orders,
6. [Revenue] > 5000
7. )
8. )
```

It may look straightforward and give the desired result, but it can have some severe implications on the performance.

The Orders table could be huge, and scanning it row by row to check the condition (Revenue > 5000) can be a time-consuming operation.

We can wrap around our first measure with KEEPFILTERS, which avoids the overwrite of existing filters, and consumes less memory

```
1. Revenue (Large Orders) KEEPFILTER =
2. CALCULATE(
3. [Revenue],
4. KEEPFILTERS(
5. FILTER(
6. ALL(Orders[Order Quantity],Orders[Unit Price]),
7. [Revenue] > 5000
8. )
9. )
10.)
```

KEEPFILTERS returns a much smaller table for the iteration as compared to the entire Sales table.

Download __sample file__

Power Query is a powerful ETL tool for Excel and Power BI. It has the capability of connecting with multiple data sources and easy-to-use data transformations tools. This article shows three tips to enhance the overall experience.

For me, *Applied*** Steps** under

By renaming these steps, we can make this segment of Power Query more informative.

It is simple – Select the step > Use the key F2

Alternatively, you can right-click on the steps > Rename

We can add more details using ** Properties. ** It provides additional information when we hover on the step.

The formula bar is where we can see the M code for each transformation.

We can use it for a few quick modifications and reduce the number of ** Applied Steps **in a query.

In the image below, we are eliminating the steps of renaming a column by modifying the M code in the formula bar:

Apart from this, it also helps in getting familiar with the M query.

To view the formula bar, go to **View** > Check the option *Formula Bar*

Formula Editor displays the M code for one step. With Advance Editor, we can view all the M codes applied on the table:

Home > Advanced Editor

The M code makes it difficult to read due to the lack of proper structure even with using the Word-wrap feature:

Power Query Formatter provides a neat solution for this. Just copy the entire code from the Advanced Editor, paste it to the __PowerQueryformatter.com__ and format it with one click:

PowerQueryFormatter.com

The output looks neat and more legible.

With a few simple tweaks and modifications, we can enhance the experience of Power Query.

__#vivran__

Merge Query is a powerful transformation tool in Power Query. It is equivalent to JOINS in SQL. Power Query supports six kinds of joins:

While the most used join kind is Left Join (equivalent to VLOOKUP in Excel), other join kinds are equally useful. This article explores three such use cases.

This article assumes that we know how to apply merge queries in Power Query. For more details, please refer to __the article__

We have two tables: Orders and Returns.

Download __Sample Data file__

*Order *table has all the orders(5497 records), and *Returns *has order status of 572 orders with status as *Returned*.

We need one table holding details of returned orders.

With *Orders *table selected in Power Query, perform **Merge Queries as New** on *Order ID *with *Returns *table, and select **Right Outer Join**:

The output table holds 572 records of all the orders which have returned.

The reverse of the Use Case 1: we need a table holding all records for the status of the orders as not returned. This time, we repeat the steps performed earlier with one change: use **Left Anti** as join kind.

It excludes all the records from the *Order *table, which are matching from the *Returned *table.

Consider the following tables:

The records highlighted in yellow are common in both the tables.

The goal is to find if there is a different *Name *assigned in the *tblB *for the same *ID.*

For this, we perform Merge Queries twice.

In the first step, we Merge queries to find records which are matching on both *ID *and *Name *with join kind as **Left Anti***:*

**Output**: We have four such records that do not have an exact match with *tblB *on *ID & Name.*

Remove the column for *tblB.*

Now we need the *Name *from *tblB *against the corresponding *ID*; hence we Merge query on *ID* with **Left Outer Join**:

Expand the *tblB *to get the corresponding *Name:*

In conclusion, there is only one record in *tblB *where ID is matching with *tblA *but has a different *Name, *whereas three records from *tblA *have no match in *tblB. *

]]>

Usually, we start designing a report, dashboard, or a presentation slide with a blank white space. Each element, color, or texts we add to the page competes for attention from our viewers. It implies that we should be careful about what we put on that space. Limit the ink we use to convey our message.

Let us consider the following example—a simple table with sales details by manager and order priority.

Each element in the table is equally competing for our attention.

Compare it with the following table:

In the above example,

- Numbers are more apparent as the borders have taken a step back and not competing with the numbers.
- Cells with zeros appear as blanks. It makes our table cleaner and the viewers can pay more attention to the sales numbers.
- Numbers have a
*thousand-separator*and a different format for negative numbers.

This article explains how to hide zeros in an Excel table.

Select the cells > CTRL+1 (for Format Cell) > Select an appropriate format

In the above example, I have selected Number format with a negative number format and a select *Use 1000 separator(,)*.

Go to Custom Format > Under type, we find the structure of the format we have selected.

In Excel, the number format has a separator - semicolon (;).

In the example, Excel has applied format for **positive **and **negative **separated by a semicolon(;). All we require is to add the second semicolon (;) after the **negative **format structure. Any structure defined after the second semicolon is for the **zeros**.

And if we leave it blank after the second semicolon, then that is how Excel is going to display zeros: blank.

And that is it.

Clutter is a distraction. Please get rid of it!

]]>One of the clients came up with a small project. They wanted to quickly analyze their sales performance in three countries for the past three years. The data file shared with me was not raw data, but a country-wise monthly summary report.

Below is one of the three tables:

The other two tables are for Germany and India.

Following are significant challenges with the data:

- Data is in three separate tables
- Yearly data stored in three separate columns, with a prefix “Sales.”
- The month in a separate column, in text format.

We are using Power Query to compile and transform data under two minutes!

For this example, I would like to compile all the three tables in a separate table. Since all the tables have the same headers, our job is straightforward. We will use the Append Query function, which equivalent to Union in SQL queries.

Home > Append Query as New

In the Append Query window, select *Three or more tables *and add all the tables under *Tables to append.*

Click OK, and we get a new table with the data from all the three tables.

Right-click on the other three tables and uncheck the option *Enable Load.*

This step ensures only the table with all the records loads in the data model. Since we do not require the other three tables, we have excluded them from loading to the data model.

The sales data for three years are in three separate columns. For analysis, this is not helpful. What we need is all the sales value in one column and along with the subsequent years in another separate column. For this, we use the option *unpivot.*

Select the columns for the three years > Transform > Unpivot

For the next step, we need to extract the year value from the column. There is more than one way to do this. In this example, we are using the option *Extract *under Transform

Select the column > Transform > Extract > Last Characters

Since we need the last four characters, enter 4 in the Extract Last Characters:

Output:

Now the challenge is that we have two columns containing date components with column property as text. To meet our objective, we need one column with a date property.

Select the column Month and Year > Right-Click > Merge the column with space as delimiter

Transform the column from Text to Date

And our table is ready for analysis under two minutes.

]]>