Power Query offers an extensive list of tools and features for text transformations. And it is quite vital as many tools used in analytics and data mining, including Power Query, are case sensitive. It means Romeo and ROMEO are not the same for Power Query.
This article shows how Power Query is handy in resolving a few common challenges.
We consider the following data table for the demonstration.
There are a few issues with the dataset:
EMP ID have spaces and not formatted correctly
There should be one column for the name (First & Last Name combined)
Address, city & state in separate columns
Power Query offers a no-code solution to all the above issues.
Download the sample data file here
Removing Unwanted Space(s) & Characters
Select the column > Transform > Format > Trim/Clean
Trim: it removes all the trailing and leading spaces in the characters. Equivalent to the Excel formula TRIM.
Clean: It removes non-printable characters. Equivalent to the Excel formula CLEAN.
Changing the case of the texts
Select the column > Transform > Format > UPPERCASE
lowercase: Converts all the characters to lower case. Equivalent to the Excel formula LOWER
UPPERCASE: Converts all the characters to upper case. Equivalent to the Excel formula UPPER
Capitalize Each Word: Converts the first letter of each word to the upper case. Equivalent to the Excel formula PROPER
Merging Two (or more) Columns
Select the columns > Transform/Add Column > Merge Columns
Use the right delimiter (in this case, space) and define the name of the output column
The goal is to separate Address, City & State in three separate columns. Power Query offers to split the column by following options
In this case, all of them separated by using the pipe (|) character. Hence, we split the column by using (|) delimiter.
Select the column > Transform > Split
In the Split Column option, we do not get the choice for the "|" separator as a choice. Hence, we use custom.
In most of the cases, Power Query auto-detects the suitable delimiter and present us with the choice
The only thing left is to rename the column (Double-click/Right-click the header and Rename)
Split vs. Extract
There are a few options in Power Query, which are similar but gives different output. In the case of text transformation, Split and Extract is one of them.
As the name suggests, when used Split Column option, it distributes the existing data by adding separate columns in the data table.
Options available under Extract
In the case of Extract, it just extracts certain characters from the column. Depending on whether we are selecting this under Transform or Add Columns, it replaces the existing value or creates an added column.
Transform vs. Add Column
In Power Query, you find the following features commonly available in the two ribbons: Transform & Add Column
Features under Transform
Features under Add Column
There is only one difference: when selected from Transform, it replaces the value of the existing column with the output. At the same time, Add Column gives the output in a separate column.
So, choose wisely.