I have an application, which needs the functionality of converting PDF documents to text documents, and then parse them to retrieve information. I am using xpdf utility pdftotext
to achieve that.
I am super concerned about attacks on my server because of the vulnerabilities in PDF documents, like backdoor access in the uploaded PDFs, or other security flaws that come with it like embedding some commands which can hurt us, like brute force passwords on my database etc.
Possible Solutions:
Tools like PDFiD which is suggested here on SE answer. But, they are fairly outdated and I am very nervous about using those solutions.
Executing the pdftotext
command as a different user which has least privileges on the machine, so that it is not able to view or change anything which it does not own, and also not able to issue sudo
or su
commands.
I am looking for ideas as to how to secure myself against such potential attacks.