Text Transformation with Power Query



Power Query offers an extensive list of tools and features for text transformations. And it is quite vital as many tools used in analytics and data mining, including Power Query, are case sensitive. It means Romeo and ROMEO are not the same for Power Query.


This article shows how Power Query is handy in resolving a few common challenges.

We consider the following data table for the demonstration.

There are a few issues with the dataset:

  • EMP ID have spaces and not formatted correctly

  • There should be one column for the name (First & Last Name combined)

  • Address, city & state in separate columns

Power Query offers a no-code solution to all the above issues.


Download the sample data file here

Removing Unwanted Space(s) & Characters

Select the column > Transform > Format > Trim/Clean

  • Trim: it removes all the trailing and leading spaces in the characters. Equivalent to the Excel formula TRIM.

  • Clean: It removes non-printable characters. Equivalent to the Excel formula CLEAN.

Changing the case of the texts

Select the column > Transform > Format > UPPERCASE


  • lowercase: Converts all the characters to lower case. Equivalent to the Excel formula LOWER

  • UPPERCASE: Converts all the characters to upper case. Equivalent to the Excel formula UPPER

  • Capitalize Each Word: Converts the first letter of each word to the upper case. Equivalent to the Excel formula PROPER



Merging Two (or more) Columns

Select the columns > Transform/Add Column > Merge Columns

Use the right delimiter (in this case, space) and define the name of the output column



Split Columns

The goal is to separate Address, City & State in three separate columns. Power Query offers to split the column by following options






In this case, all of them separated by using the pipe (|) character. Hence, we split the column by using (|) delimiter.


Select the column > Transform > Split

In the Split Column option, we do not get the choice for the "|" separator as a choice. Hence, we use custom.


In most of the cases, Power Query auto-detects the suitable delimiter and present us with the choice


The only thing left is to rename the column (Double-click/Right-click the header and Rename)



Split using the | delimiter


Split vs. Extract

There are a few options in Power Query, which are similar but gives different output. In the case of text transformation, Split and Extract is one of them.


As the name suggests, when used Split Column option, it distributes the existing data by adding separate columns in the data table.











Options available under Extract









In the case of Extract, it just extracts certain characters from the column. Depending on whether we are selecting this under Transform or Add Columns, it replaces the existing value or creates an added column.


Transform vs. Add Column

In Power Query, you find the following features commonly available in the two ribbons: Transform & Add Column

Features under Transform



Features under Add Column



There is only one difference: when selected from Transform, it replaces the value of the existing column with the output. At the same time, Add Column gives the output in a separate column.


So, choose wisely.




Date transformation with Power Query

Drop Me a Line, Let Me Know What You Think

 

contact@vivran.in

+91 9871-641-146

MS EXCEL || POWER BI || POWER APPS || POWER QUERY || TABLEAU || DATA ANALYTICS || TRAINING