How to Structure Source Data for a Pivot Table & Unpivot
This post will show you the proper way to setup or organize your source data for a pivot table.
Pivot Table Overview
Whether you are using Excel or a Google Spreadsheet, pivot tables are a great tool for summarizing and analyzing large amounts of data. They can be huge time savers for creating reports that present your data in a clear and simple format. With the advent of PowerPivot, there is no doubt that pivot tables are the way of the future for Excel.
Before you can create a pivot table, you must have your data laid out in the right structure. This is the most critical step, and also the most common mistake when learning how to create pivot tables. Once you have your data organized correctly, you will become much more proficient at creating reports, analyzing data, and finding trends.
This article will explain:
- The correct vs. incorrect structure for pivot table source data.
- Why it is important to understand this.
- How to convert your reports into the right structure using formulas
(free sample workbook).
Data Table Structure
The first step to creating a pivot table is setting up your data in the correct table structure or format. This is the source data you will use when creating a pivot table. Your source data should be setup in a table layout similar to the table in the image below.
The following is a list of components of a data table. These terms will be used throughout the article.
- Fields – Columns that define the values in the rows.
- Column Header – Name that describes the data in the field.
- Data Records – Rows in the table below the header that contain the data.
- Record Set – One row of data that contains values for each field.
The data table contains a column for each field and rows for each data record. The column fields are named with descriptive attributes that define the values in the record sets (rows). For example, your sales table may contain the following columns: Company, Region, Product, Month, and Sales Amount. These are the descriptive fields that define what values will be in each row of the table. Each row will contain a sales record for a different combination of company, region, product, and month. It’s also important to note that field names (column headers) must be unique throughout the table.
Wrong Data Structure
Sometimes you will receive a report that is structured like the image above, with some page header info, months across the top, and products or accounts listed down the first few columns on the left. This data is in the WRONG structure for a pivot table. The data is already in a summary format, which is what we want the pivot table to produce. However, you may want to use this data as a pivot table source to do your own analysis and produce different slices of the report.
Why is it Wrong?
It is important to understand why the data structure is wrong for a few reasons.
First, it will help you request the data in the proper format. When we receive data in a summary report format like the example above, we usually don't have control over how this report is produced. But somebody in the finance or IT organization does have control. They should be able to produce a report in the table structure you need for your pivot. So understanding why you need it in the correct format will save you time having to manually convert the report. Otherwise, you will basically have to reverse engineer the report to get it in the proper table structure.
Second, it will help you understand how pivot tables work to summarize, filter, sort, and slice your data. The basic understanding will allow you to learn more advanced techniques of adding calculated fields and items.
The job of the pivot table is to summarize your source data table based on the criteria you specify in the filter fields (Report Filter, Column Labels, and Row Labels). You can think of it as a very advanced way to arrange and filter your data. The pivot table is an extremely powerful tool, but can only be used to its full potential if the source data is in the right structure.
Getting the Structure Right – Setting Up Your Source Data for a Pivot Table
In the image above, the sales data table on the right contains all sales amounts in the [Sales $] column. With this format you could easily sum the column to produce the Total Sales $ for all companies, regions, products, and months. You could then start filtering the columns to see only the sales for one month and one region. A pivot table works the same way, and basically filters your table based on criteria you specify in the filter fields.
The basic rule of the data structure is that all values of the same type need to be in one column.
This one rule should hopefully make it easier to quickly determine if your data is in the right structure. If the data you are trying to analyze is spread out over multiple columns, then you will likely need to convert it before creating a pivot table. In the “Original Report” above, the Sales $ are in multiple columns by month (Jan – Apr). This one observation tells us that the data is in the wrong structure.
Converting the Data
We now know that we need to convert our original report into a table so that each value is in its own row (record set). Each value will contain a field (column) for each attribute that defines the value (company, region, product, month). This means that many of the field values will be repeated in the data table.
The following image shows where the values for each field are derived from in the original report. This mapping should help you understand what is needed to convert the report into the correct structure.
The image below shows another view of this conversion. Each part of the report is color coded to make it easy to see how the data is translated to the table. The Value Range (green) on the left side is basically stretched out into one column in the table on the right. All the defining characteristics of the values must be entered in the fields (columns) to the left for each record set (row).
In the original report format, the page and column headers are used to describe multiple values (data points). For example the column header for the month Jan is associated with all the rows below it for the different products. When the data is converted to the proper format, each value will be placed in a separate row, and a month column will be created that contains all the months. The row labels for products will repeat in a similar fashion. The page headers for company and region will repeat on every row of the data table because they are the same for every cell in the value range.
Solution #1 – Unpivot with Power Query
Power Query is a free add-in from Microsoft for Excel 2010 and 2013, and it makes this process really easy. Power Query will transform your data into the correct format with the click a button.
The following screencast shows how to use the Unpivot Columns button in Power Query.
I also have a video that explains the unpivot process in Power Query in more detail.
Video best viewed in full screen HD.
As awesome as Power Query is, you might not be able to get it. It is only available for the Professional Plus versions of Excel 2010 and 2013. Checkout my Complete Guide to Installing Power Query to determine if your version of Excel is compatible.
Solution #2 – Convert the Data with Formulas
If you are unable to use Power Query, then you will need to reverse engineer the report to the correct format before using it in a pivot table. This can be done with lots of copy/paste and transpose. However, there is a faster way using formulas.
The image below shows a sample of how a report can be converted into the correct table structure using a few formulas. A sample workbook that contains all the formulas is available for download below.
The model makes use of the INDEX function to reference the original report, and pull the data into the table. The [Row Index] and [Column Index] are helper columns that contain formulas to return the correct row and column numbers used by the Index formulas in the data table. The sample workbook contains two examples.
- Example 1 is similar to the report format above, with page headers, column headers, and row labels.
- Example 2 does not contain page headers, but does contain two different value types: sales and margin.
The file contains cell comments with more detailed descriptions of the formulas.
Please click the link below and the Excel file that contains the conversion model will be emailed to you immediately. You can use this model as a template to quickly convert your report data into the proper structure for the source data of a pivot table.
You will also have the option to subscribe to my free email newsletter to stay updated with new articles and videos that will help you learn Excel. After confirming your subscription you will be able to download my “10 Excel Pro Tips” eBook. It's all free!
Now that you have a basic understanding of how your source data should look, the next step is to start creating pivot tables (and impress your boss). 🙂
If you haven't already seen it, checkout my free video training series on Pivot Tables & Dashboards. This will help you get started creating pivot tables and show you what a powerful tool they can be.
- Complete guide on How Pivot Tables Work
- 3 Part Video Series on Pivot Tables and Dashboards
- Learn How To Compare Multiple Lists with a Pivot Table instead of using messy formulas.
- Convert a Pivot Table to SUMIFs Formulas with this free VBA macro.
- Learn how PowerPivot can be used instead of VLOOKUP to create a relationship between two tables.
- This tutorial, INDEX Function – A Road Map for Your Spreadsheet will help you learn the INDEX function (it's a must know).
Please leave a comment below with any questions.