1

I have an application, which needs the functionality of converting PDF documents to text documents, and then parse them to retrieve information. I am using xpdf utility pdftotext to achieve that.

I am super concerned about attacks on my server because of the vulnerabilities in PDF documents, like backdoor access in the uploaded PDFs, or other security flaws that come with it like embedding some commands which can hurt us, like brute force passwords on my database etc.

Possible Solutions:

Tools like PDFiD which is suggested here on SE answer. But, they are fairly outdated and I am very nervous about using those solutions.

Executing the pdftotext command as a different user which has least privileges on the machine, so that it is not able to view or change anything which it does not own, and also not able to issue sudo or su commands.

I am looking for ideas as to how to secure myself against such potential attacks.

Abhishek
  • 111
  • 2
  • 1
    I'd be a little surprised if either of those issues applied to `pdftotext`, but I think your suggestion of executing it as a user with minimal privileges is a good one. – AndrolGenhald Oct 03 '17 at 13:55
  • You will find a bunch of options in [this SO post](https://stackoverflow.com/q/6187250/3545273). In theory, you should make an audit of the code for possible vulnerabilities that could be targetted by specially forged pdf files. My advice: test some of them and choose one that has a good reputation and is correctly maintained. – Serge Ballesta Oct 03 '17 at 14:04
  • @SergeBallesta Thanks for the response. Audit of the code and reading about the structures of the PDF are definitive solution to this problem, but given time constraints, I was looking for more of a guideline for such deployments which can be secure. – Abhishek Oct 04 '17 at 05:20

0 Answers0