Chris Hager
Programming, Technology & More

Pdfx

RSS Feed

Find broken hyperlinks in a PDF document with PDFx

Easily find broken hyperlinks in PDF documents with PDFx, a free tool to extract references and metadata from PDFs.
PDFx is a free command-line tool to extract references, links and metadata from PDF files. You can also use it to find broken links in a PDF file, using pdfx -c: For each URL and PDF reference, pdfx performs a HEAD request and checks the status code. It there are broken links, PDFx print the link with the page number where the link was found in the original pdf:


PDFx v1.0 - Extract metadata and URLs from PDFs, and download all referenced PDFs

I just released PDFx version 1.0, a Python tool and library to extract metadata and URLs from PDFs, and to automatically download all referenced PDFs. The project is released under the Apache license with the source code on Github! Features Extract metadata and PDF URLs from a given PDF (file or URL) Download all PDFs referenced in the original PDF Works with local and online pdfs Use as command-line tool or Python package Compatible with Python 2 and 3 Quick Start Grab a copy of pdfx with easy_install or pip and run it:

swirl