Find broken hyperlinks in a PDF document with PDFx
PDFx is a free command-line tool to extract references, links and metadata from PDF files. You can also use it to find broken links in a PDF file, using pdfx -c:
For each URL and PDF reference, pdfx performs a HEAD request and checks the status code. It there are broken links, PDFx print the link with the page number where the link was found in the original pdf:
You can simply install PDFx with easy_install or pip and run it like this:
Run pdfx -h to see the help output:
For more examples and infos, take a look at the PDFx project page. You can find the code on Github, the code is released under the Apache license.
Feedback, ideas and pull requests are welcome! You can also reach me on Twitter via @metachris.