Subscribe for updates on posts
Be the first to read the latest news

Parsing and asserting inside a PDF document from Selenium in three easy steps

August 25th, 2017 by Sorin Raschitor in Automated testing, Software Development

automated software testing SeleniumThe Portable Document Format (PDF) is a file format used to present documents in a manner independent of application software, hardware, and operating systems. Each file encapsulates a complete description of a fixed layout flat document.

In its particular way it is different than a common document file and also it is more difficult to edit the data inside it.

The following use case is presented in this Selenium tutorial:

Having Selenium based automated tests for a software project, in a test script, we require the verification of pre-defined expected results against noticed results data that are stored in a PDF file.

Why is PDF verification useful in automated tests?

PDFs are commonly used to download data from websites, like user’s details, balance sheets or Pro Forma Invoices for individual purchases. In a typical automated test scenario for an eCommerce WebApp, we would create an order, submit it and then verify the result. This will also include asserting if the details in the ProForma Invoice (PI) coincide with the result presented in the WebApp.

In order to be able to assert this information, we need to use a PDF reader library that will parse the entire document and add all its content in an object of type PDF that we can then manipulate.

In order to implement this in Java, the following steps can be used:


  1. Import the PDF reader library (in our case codeborne.pdftest) in the test class.

Selenium - Import PDF reader library


If you are using Maven to manage your project dependencies, you can add the following in pom.xml:

Selenium - Add the following in pom.xml:


For further details on how PDF parsing is done, you can check the Github repository on:

  1. Download the PDF file that will be parsed from the WebApp, during the automated test. Create a path towards the download location for the test script and add initial assertions.

In order to verify a file we need to know it’s name and location to create a path to it.

On a Windows machine the usual default download for Chrome is “C:\Users\%username%\Downloads”, however this can be easily configured.

The name of the file should have an individual but predictable pattern, like the user name, date of purchase, order number or maybe and incremented file downloaded number.

So what we need is to determine exactly how the filename is created, save the details in the previous steps of the test script and use them when you create a direct path to the downloaded PDF.

In the scenario used for this tutorial, the PDF will concatenate the user id and the order number.

Selenium - The PDF will concatenate the user id and the order number.


If the testing process is run in on a separate server in a Selenium Grid configuration, on multiple machines then the username will be different for each one. It is recommended to use in the above example the username system variable of Windows instead of the hardcoded username. In JAVA this is done by using System.getenv(“username”) when creating the path to the file.  The argument eComm is used for to get the previously stored order number.

After the path to the file has been created, it is recommended to add additional verification steps.

We should verify that the PDF can be opened from the browser to have a quick overview of it. This is simply a simulation of normal user behaviour, through Selenium automated tests.

Once the PDF will be displayed in browser you can install a listener, make a screenshot of it, store it or send it via an email as required.

To assert that the correct url is present, you can use the below code snippet.

Selenium - An error can be returned in case the file is not found/corrupted.


  1. Create a PDF object and verify portions of the string.

Once we know the location of the file we can create a PDF object and start with the verification. The logic of this method will be to add all details from the document in an object of type PDF, and verify key portions from it that are relevant to our test, like customer name, order total value, order number etc.

Selenium - Create a PDF object and verify portions of the string.


Other methods of verification contained in code.codeborne, besides containsText are containsExactText, containsTextCaseInsestive, which use a different regular expression for the pdf parsing.

For support in deciding how to approach software testing for your organization, visit our dedicated page or feel free to say

You might also like

A-Z 0-9 custom sorting in Lucene The natural sort order for String fields is 0-9 A-Z so it seems a custom sorter is needed. I guess this...
Select/delete all items in Solr To select all items for a field in Solr you can use the query : some_item:, but if this field is missing...
Spreadsheet dojo widget Update3: We're back to our own hosting. Update2: Razvan Dragomirescu was kind enough to mirror the spreadsheet...
Create and populate your database independent of the db server using ant, dbunit and hibernate Database creation is such a common task that you must have some scripts and tips and tricks at hand,...

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.