Working with data is challenging. Data is rarely in the right format, in the right place, at the right time.
If you are new to Power BI or upgrading your analysis from Excel getting familiar with the transformation tools will make it easier to tell data stories and for your end-user to gain insight.
Load the data
1 – Rename the query
2 – Rename columns
3 – Remove rows and columns
4 – Remove rows with filters
5 – Remove duplicates
6 – Merge columns
7 – Replace values
8 – Format text
9 – Change data type
10 – Add a custom column
Load the data
Before we can get started shaping and transforming our data we need to either upload it or connect to the data source. Power BI supports many databases and has connectors to support platforms like Google Analytics and Salesforce.
For this example, we’ll be using video game sales data in a csv file. When the data has loaded and we then select ‘transform data’ to be taken to the Data View. From here we can shape our data using tools on the ribbon or right-clicking to access all the options we need.
1 – Rename the query
The query can be renamed in the Query Settings pane on the right-hand side of the canvas. I’ve changed this to ‘Sales’ from the file name that was loaded by default.
The Query Settings pane is also home to the ‘Applied Steps’ list. Every change you make to your data creates a new step on the list. This makes it easy to undo any unwanted changes.
Why is this useful?
If you’re loading in multiple datasets from different sources using a descriptive name makes it easy to find what you’re looking for. This is even more important when you are loading data from a database. If you have datasets named ‘Query1’, ‘Query2’, and ‘Query3’ you and your end-user are going to have a tough time finding what they’re looking for.
2 – Rename columns
By right-clicking the column header you can rename your column to something that’s more user friendly or descriptive.
Why is this useful?
Like a descriptive name for a dataset, a descriptive column name makes it easier to find what you’re looking for. This is especially important when loading datasets with technical names for columns. Have some empathy for your end-user and they’re more likely to trust your data.
3 – Remove rows and columns
You can find options to remove rows and columns under the ‘Home’ tab on the ribbon. There are options to remove a single row or column or remove all but your selected row or column.
Why is this useful?
By removing unnecessary columns it is easier for our end user to get to what they need. Scrolling through unnecessary columns is especially frustrating if you have multiple datasets to search through.
4 – Remove rows with filters
We can remove specific rows that aren’t needed in our dataset using a filter. The menu can help remove columns that are empty, start or end with a specific value.
Why is this useful?
Just like the other steps to clean the dataset, this will make it easier to navigate for our end-user and help them answer their questions quicker.
5 – Remove duplicates
By right-clicking a column and selecting ‘remove duplicates’ we can remove any duplicates in the dataset.
Why is this useful?
This is useful when loading in an unfamiliar dataset that hasn’t come from a trusted source especially if the dataset is large. This could be done in Excel but this gets more difficult with large datasets. This could also be done in a database but if you don’t have permission to load in data this may not be an option for you.
6 – Merge columns
By right-clicking a column and selecting ‘merge columns’ we can merge columns using a separator and renaming it to something useful.
Why is this useful?
This is useful for merging a First and Last Name column, or columns that would be better together. This can also be done with a formula but right-clicking makes this task quicker.
7 – Replace values
You can find the replace values option on the ‘Transform’ tab of the ribbon. The advanced menu gives options to insert special characters or match the entire cell contents.
Why is this useful?
Datasets may have been loaded with typos that need cleaning up or values that aren’t as descriptive as they could be. This is especially useful if product codes or jargon have been introduced and plain English would make it easier for end-users to understand.
8 – Format text
Formatting options can be found in the ‘Transform’ tab of the ribbon and can be used to trim, clean, and change the case of text.
Why is this useful?
Cleansing text fields makes it easier to read and perform further transformations. The ‘Trim’ option is particularly useful to remove any leading or trailing spaces.
9 – Change data types
Clicking on the icon next to the column header shows the data type transformation options.
Why is this useful?
This a quick way to change the data type if it has been loaded incorrectly. Changing a number which has been loaded as text means we can start performing calculations. Making sure dates are stored as dates means we can use these as filters in dashboards.
10 – Add a custom column
The custom column option can be found on the ‘Add column’ tab of the ribbon. A column can be created and named here using a formula to build the logic.
Why is this useful?
This can be useful to add calculations on existing columns, adding text to an existing value, or displaying part of a date. Just remember to check the data type before you start trying to add values together.
These are just some of the beginner-friendly options available to cleanse and shape data in Power BI. There are plenty of other tools available to wrangle your data both within the GUI and the M language but this is a great place to start.
No matter which data source and tools you are using it’s important to keep the user of your data and visualisations in mind. Having empathy for how the data will be used should guide how you transform your dataset.
Further reading
- Doing Power BI the Right Way
- Power Query Is Everywhere
- What is Power BI, Power Query and Power Pivot?
Photo by Gustavo Fring from Pexels
Comments are closed, but trackbacks and pingbacks are open.