This tutorial explains how to find broken links in PDF files. To do this I will use a free command line software called PDFx. Though, it is not meant to find broken links in a PDF file, but it has a feature to check links in a PDF file. Using this simple utility, you can list all the links that are listed in a PDF file and also download them if it is a direct link. It will show all the broken links, alongwith page number on which broken link is present.
This PDF link checker requires Python to be installed on your PC, which is very easy to install. After that, you need to simply execute this software from command line, provide path of PDF file in which you want to check broken links, and it will immediately show which all links are broken. You can even save the broken links report to a text file.
Even though there are many software to check broken links on websites, or even check broken links in Bookmarks, but checking broken links in PDF files is a whole different ball game. I tried a lot looking for a GUI based software that could do the same, but found only this command line software to check broken links in PDF files. Despite being command line software, it is pretty easy to use. Also, as this software just requires Python, so actually you can use it not only on Windows, but on Mac and Linux as well.
How To Find Broken Links In PDF Files?
PDFx was originally meant to download all the references which are given in a PDF files. Apart from that, it comes with a link checker module that helps to find broken links inside a PDF file. You can easily find broken links in a PDF file along with the error code and page number.
To use PDFx, you will need to install Python in your PC. Once installed, follow the steps below to find broken links in a PDF file.
Step 1: Open Windows Command prompt and type the following command in it. After running that command, it will install PDFx on your PC (this will be automatically installed by Python, so it is important to first install Python). If the installation goes well, then it will show the successfully finished installing message in the end.
easy_install pdfx
Now, PDFx has been installed in your PC. You can access it from any location of your PC using Command prompt.
Step 2: Navigate to the folder which has your PDF file in which you want to find broken links. Use Shift+right-click to open the Command prompt there.
Step 3: To check all the broken links, type the following command in the Command window and hit Enter. After that, it will start scanning the links and list the broken links with the error code (like, 404, 403, etc.) and page number. See the below screenshot.
pdfx [PDF_filename] -c
If you want to store the results of the broken links in a text file, then just append “> filename.txt” at the end of the command.
So, in this way you can easily find broken links in PDF files using PDFx. The software does what it promises by listing all the broken links along with the error code and page number.
Do note that it tests only web links for broken links, and not links to other files.
Conclusion
PDFx makes it pretty easy to find broken links in PDF files. I really like the fact that it gives error code as well as page number also with each broken link. I wish the output was formatted a bit better, so that I could open it as Excel or CSV. Nevertheless, this is the only software I was able to find that could even find broken links in PDF files, so I will take what it gives. If you know of some other software that can find 404 errors in PDF files, or can check multiple PDF files for broken links together, do let me know in comments below.