How to Import PDF Files into Excel with Power Query

Bottom Line: Learn how to use Power Query to import PDF files into Excel.  This example includes data that spans multiple pages in a complex PDF file.

Skill Level: Intermediate

Video Tutorial

Watch on Youtube & Subscribe to our Channel

Download the Excel Files

You can download the file I use in the video. I've included both the initial and the final workbooks.

Turn a PDF into an Excel Worksheet!

As I've said before, Power Query is Excel's most powerful and magical data tool. And now it has the ability to import a PDF file into Excel. The new feature even detects tables of structured data within PDF pages. And we can use all of the data cleanup features of Power Query to prepare the data before it's imported, and automate the entire process.

Import data into Excel from a complex PDF file

It's fairly easy to do if you're familiar at all with Power Query (if you're not, get started with my overview tutorial here: Power Query Overview: An Introduction to Excel’s Most Powerful Data Tool, and then you can install it here: The Complete Guide to Installing Power Query).

I'll walk you through the process of importing tables from PDF to Excel.

Availability

It's important to note that the From PDF feature is currently available on the Beta Channel (formerly Insiders Fast) for Microsoft/Office 365 subscribers. It will be rolling out to other channels in the coming months. I don't have a timeline on the release, and apologize in advance if you don't have access to it yet.

Power Query Reads Complex PDF Files

In the video, I've used an example PDF that is an exported report on web page speed from GTmetrix. The data set I am interested in spans two pages and looks like this:

Import Data into Excel from PDF file from multiple pages

Steps to Import a PDF into Excel

If you have the new feature you will see From PDF on the From File menu. Again, you must be on the Beta Channel of Microsoft 365 at the time of this publication.

Power Query From File From PDF Menu - Import PDF into Excel

Here are the steps to import the PDF file to Excel:

  1. Starting from the Data tab on the Ribbon, click on Get Data, then From File, then From PDF.
  2. Find the PDF file wherever you have it stored on your computer. Then click on Import.
  3. The Navigator window will appear, showing a list of all the data sets that Power Query has found in your PDF. At the top of the list will be individual sets of data tables, and at the bottom it is broken down by page. If you want to import more than one data set, check the box that says Select multiple items.
  4. Select the item or items that you want to import.
  5. Click on Transform Data.
Power Query Import PDF Navigator Window Tables and Multiple Pages

This brings up the Power Query editor and each table that we've imported is listed as its own query.

It's likely that the queries you've pulled from the PDF have the headers listed in Row 1. To fix this, simply click on the menu icon to the left of your headers and select the option that says Use First Row as Headers. You will do this for each query.

Power Query Promote Headers - Import PDF

You can also do any other cleanup you see fit before loading the data into a worksheet.

If you want to combine these queries into one table before loading it into the workbook, you can use the Append feature in Power Query to stack the data together. See this other tutorial for how to do that: How to Combine Tables with Power Query.

Power Query Append to Combine Stack Multiple PDF Pages

From PDF to Excel Table

Close & Load the query to output the data to an Excel Table in the workbook. As I mention in the video, you might want to first create connections only for all of the queries, then output the appended query to a Table.

Data from PDF File is imported to Excel Table with Power Query

Free Training Webinar on the Power Tools

Right now I'm running a free training webinar on all of the Power Tools in Excel. This includes Power Query, Power Pivot, Power BI, pivot tables, macros & VBA, and more.

It's called The Modern Excel Blueprint. During the webinar I explain what these tools are and how they can fit into your workflow.

Modern Excel Blueprint Training Webinar Excel Campus Jon Acampora

You will also learn how to become the Excel Hero of your organization, that go-to gal or guy that everyone relies on for Excel help and fun projects.

The webinar is running at multiple days and times. Please click the link below to get registered and save your seat.

Click Here to Register for the Free Webinar

Conclusion

This is a super cool feature that saves you from having to manually reconstruct a table that appears in a PDF.

Do you have a PDF that you'd like to import into Excel? If so, you can upload your example file here and we'll consider it for a future tutorial lesson.

Please leave a comment below and let us know what you think of this new feature. Thank you! 🙂

38 comments

Your email address will not be published. Required fields are marked *

  • Brilliant Jon! This is so very cool! Interesting how it looks and feels so similar to Power BI. Thanks!

      • Since I don’t have this feature in Excel yet (just changed to Beta channel), I went into PowerBI desktop and did this procedure with a credit card statement I needed in Excel. Thank you!

  • so as of right now there’s no way for office 365 users to get PDF importing in excel? where can someone access beta channel?

    • Hi Javier,
      That is correct, you will need to be on the beta channel until the feature is rolled out to the channel that you are on. Here is a link with info on how to enroll in the Office Insider’s program. It’s free.

      https://insider.office.com/

      I hope that helps. Thanks again and have a nice day! 🙂

  • Awesome indeed. Good news.

    A question and an observation:

    • Does this depend on the PDF file being “searchable“?

    • Automation: “refreshability” depends on predictability — of the Report layout, and of how the feature identifies content areas, maps their data into tables and numbers them.

    – For a complex, compound dashboard-style layout like the patient you placed on the operating table, the parse is likely to yield inconsistent results from period to period.

    – For a simple but multi-page tabular report, if the number of pages varies, you’re sunk.

    • Hi Jack,

      Great observations. Here are some thoughts:

      The PDF will likely need to have text that is recognizable by OCR. If the text is in an image that is then saved as a PDF file, Power Query probably won’t be able to pick that up. I’m not 100% sure on this though.

      The automation definitely depends on the predictability of the report/data. This is true for any data source, but will probably have great variance with some PDFs files.

      For the example I used, you’re correct that it might be challenging if the page numbers vary. However, you might just need to modify the table number in the Navigation step/formula for the query to pull the correct table. You wouldn’t necessarily need to perform all of the subsequent cleanup steps again.

      A simple multi-page tabular report might actually be easier. The Source step in the query lists all of the objects found in the PDF. You can filter this table for Pages or Tables only, and even exclude specific page numbers. Maybe page 1 is always a cover page…

      Then combine all of the data based on the filtered items. This would allow you to account for a varying number of sheets.

      I’ll do a post on this scenario in the future.

      I hope that helps. Thanks again and have a nice day! 🙂

  • Hi Jon,
    This is a great feature. Thank you for sharing it and for always bringing good stuff about Excel. I was wondering how I can import a report generated in Meditech with Power Query. This report is a free text and when I try to import it with Power Query all I get is a single column with hundreds of rows? Any thoughts?

    Thank you in advance,

    Jose

    • Hi Jose,
      It depends on the file format. If you are able to export the report to a CSV file then it should be more structured in Power Query. You might need to change the default separator (comma, semicolon, etc.) That can be done on the first step of the import process for a CSV file in Power Query.

      I hope that helps. Thanks again and have a nice day! 🙂

  • I didn’t have access to the Beta features with my Office 365 Professional Plus 2016, so I purchased a subscription to Microsoft 365 Personal, then signed up for the “Insider” channel and downloaded and installed everything. It says I am on the Beta channel, but I still don’t see the features used in this video.

    • Hi Mike,
      I’m sorry to hear that. I’m pretty sure the feature is available for all beta channel users. Sometimes Microsoft only releases features to a portion of users on the beta channel, but I don’t believe that is the case for this feature.

      You will also want to check that you are on the beta channel. Here is a screenshot of where you can see that on the File > Account menu in Excel.

      Microsoft 365 Excel Account Beta Channel

      If you don’t see Beta Channel there then the channel might not have been activated properly. You might need to completely restart Excel, the computer, or potentially reinstall Office.

      I hope that helps. Thanks again and have a nice day! 🙂
      You might need to completely close

    • Hi Dennis,
      Great question! A merge will join the tables together, creating a relationship between them. Merges are used when you want to lookup and return data columns to the table. Similar to a VLOOKUP. Here is an article on Merge in Power Query.

      In this case we want to stack the data sets because they contain the same data, just in two different queries. This would be similar to copying and pasting one table below the other. This is referred to as an Append in Power Query. Here is an article on Append in Power Query.

      I hope that helps. Thanks again and have a nice day! 🙂

  • Hi,
    I love the feature, but it isn’t available for me. I’m running MS Office 2016 Professional, but there isn’t any option on the Data menu item for loading a PDF file. Please assist.

  • Hi Jon
    When you append the tables they stack vertically, is there a way to append them horizontally based on a specific criteria in my first table?

  • Hi Jon,
    My firm recently pushed O365 to all of our laptops. Unfortunately, I do not have the “From PDF” feature when I got to import data from file. Do you have any ideas as to why that is? I tried asking our IT department, but I have not heard back from them.
    Thanks very much

  • Hi Jon,

    great explanation, thanks a lot!
    Is there any solution to to get every table/page from a pdf of multiple pdfs from a folder? That would be awesome.

    Kind regards,
    Dennis

  • Hi Jon
    I have a large amount of tables saved in multiple PDF-files. However when trying to convert and combine the tables with power query the columns are not alinged thus resulting in a large clean up. This is the case when choosing the structured tables and/or the unstructered pages. Any suggestions on how to solve the alignment issue?

    Thx.

  • Hello, I have several PDF files to consolidate, but on some of them I want the table N°3, and if the Table 3 doesn’t exist, I want Table N°. Could you help me on this ?
    And by the way thank you for your exellent videos !!

  • How do I handle getting all the PDF from a folder (that is the easy part), where they all have a different number of pages and tables. I want the last table from the last page (this I can’t figure out how to do). Suggestions are more than welcome!

  • Thanks a lot for the presentation. Is this feature(important tables from pdf file) indistributed excel? Which version exactly?

  • Hi John,

    I have multiple PDFs in a folder and each pdf has hundreds of tables. It is time consuming to filter a table based on content. I am looking for a way to filter a specific table in each pdf based on “search phrase” in the table, because neither table number nor page numbers are the same across the pdf files.

    Note: The filter text appears only once in a pdf file (table) and it exists in all the pdfs, which means each pdf will have one matching table

  • It is in my 6 month update corporate Power BI, though not in my Excel. But, at least in the Power BI version, it only seems able to import tables that are pre-formatted as “PDF.TABLES” in the PDF. If you want to import, say bill data, from multiple bills which are in a consistent but not table format even if parts of the pdf look like tables, you appear to get nothing at all. PQ seems to be looking for things called pdf tables, if it can’t find any or those it finds are empty, even if the pdf as such is anything but empty, then bad luck. A way to go with this yet I think. Another thing I suppose, is that pdfs can be saved in fat or thin formats. Thin formats save memory of course, but, may strip the original pdf of this kind of detailed formatting information needed for PQ to manipulate them.

  • I think can confirm my observation toward the end of my previous comment. I tried this on a pdf downloaded from our system which had been uploaded to it as part of the inv paying process. These were saved in a compact format, and frankly, looked a bit low def, but good enough to read visually. But, PQ could find nothing useful in them even though they were 20 page+ multi-account invoices. But, I also have a folder of invoices received by email direct from the same supplier. These are full fat. When accessing the full fat pdf’s, suddenly there’s lots of stuff in there. Still much to do getting decent tabular data from them, but, at least it’s there to be grappled with.

  • When importing a pdf into excel, does it also bring the formulas with it? Or just the results? Thanks!

  • Do you know anyone who might help setup an excel vba code to handle reading in a pdf file, find a row, and then edit row, then save excel file.

  • Hi John,

    I’m a devoted follower! But a have a question! ‍♂️

    Is there a way around when we import data from a pdf file and Power Query says “Table is Empty” ? I understand it could be with an issue with pdf itself but I was hoping there’s another way to extract the data, maybe using OCR? Or something else?

    It’s so frustrating when the little blue symbol comes up!

    I tend to snip and paste, from clipboard, using “photo” in excel DATA tab or pasting in OneNote and extracting from there, but it’s a palaver !

    I wish there’s a better way to do it. I’m talking about Statement of Accounts, for example (supplier invoices) and it could be a manufacturer when the level of invoices are astronomical! anyway..

    Usually it could be over 300 invoices for a calendar month and depending on pdf file size, it could potentially be over 3 pages. Some companies do not have a financial system that generate csv files and I do get stuck for options.

    I can cope for now, but I thought Excel may have something including OCR these days that may help with this issue.

    Kind regards,

    Andrea Philips
    England UK

  • despite trying to convert text columns with number to a number format it does not work – what’s it causing this problem when you convert from pdf?

Generic filters
Exact matches only

Excel Shortcuts List

keyboard shortcuts list banner

Learn over 270 Excel keyboard & mouse shortcuts for Windows & Mac.

Excel Shortcuts List

Join Our Weekly Newsletter

The Excel Pro Tips Newsletter is packed with tips & techniques to help you master Excel.

Join Our Free Newsletter