1

There are tons of exploits being created using the PDF file format for most PDF viewers out there every year. PDF files have lots of power and can utilize things such as JavaScript. This is really good but comes with the cost that there is a lot of attack surface. The company "owning" PDF is also very restrictive about how we, the people, can use it (e.g. creating our own PDF viewer isn't going to make Adobe happy).

Is there a file format that can match the PDF file format's capabilities of formatting documents but is more secure?

By more secure I mean: what I am looking for is a document file format that has <2% number of exploits published each year than PDF (e.g. top 3 PDF viewers combined), a smaller code base, no JavaScript functionality because JavaScript is inherently insecure (look at why and how it was made and you'll see very quickly JS is insecure). A good post about why PDF is inherently insecure: What are the security risks associated with PDF files?

I want people to post options so there is at least one place on the internet where PDF file format competitors are set against the PDF file format.

schroeder
  • 125,553
  • 55
  • 289
  • 326
linker
  • 127
  • 3
  • Can you provide some examples of what format capabilities you are looking for? – forest Mar 14 '21 at 10:44
  • 2
    *"... create our own PDF viewer isn't going to make Adobe happy ..."* - Pardon? There are several open source PDF viewers out there. A basic research by viewing [Wikipedia:PDF](https://en.wikipedia.org/wiki/PDF) would have shown you that *"PDF was standardized as ISO 32000 in 2008 and therefore no longer requires royalties for its implementation."*. Downvoted because of lack of basic research. Also, what formatting __exactly__ you need since replicating __everything__ which is already there in a different file format would not make much sense, specifically since the PDF spec is open – Steffen Ullrich Mar 14 '21 at 11:14
  • @forest "I want people to post options so there is at least one place on the internet where PDF File format competitors are set against the PDF file format." And Steffen, I did not ask for a complete replacement. Please read the post and delete the comment. There are PDF viewers but I am looking for a PDF format replacement. I do general stuff, some for university (aka mathematics and physics) some for writing, some for fun. This question is not just for me, it's a compilation of sorts. Again read the post. – linker Mar 14 '21 at 11:41
  • "I want people to post options" - this contradicts to the purpose of this site. I'd suggest you find some other platform were such questions are allowed. – mentallurg Mar 14 '21 at 16:17
  • Questions asking for lists are off-topic since the answers could go on endlessly. There has to be a hope for a single acceptable answer. – schroeder Mar 14 '21 at 17:03

2 Answers2

2

There are a number of ways to divulge documents, all with their pro and cons. A simple comparison of a number of common formats is:

Format             Pro                           Con

MS Word .doc    Everyone can read it           Allows macro viruses

odt             Everyone can read it           Several exploits known

PDF             Everyone can read it           Javascript etc.

HTML            Everyone can read it           Not fixed document layout, 
                                               Javascript

Postscript      Almost everyone can read it    Not as flexible as PDF
                (or at least print it)

mark-down       Everyone can read it           Rudimentary lay-out possible

DjVu            Readers available; Chrome      Not standard on Windows,
                displays it                    readers must be installed explicitly

EPUB            Almost everyone can read it    Allows scripting (epubjs
                                               or even javascript)
LaTex, groff    good formatting                Requires that the document
                                               is processed before it can
                                               be read

The obvious choice would be postscript.

Alternatively, you can do a conversion yourself; On a Linux system, pdf2ps or pdftops will create a postscript file. You can then convert it back to PDF (ps2pdf). That will create a relatively clean PDF file, stripped of all sorts of PDF niceties.

----EDIT----

Added some more formats.

As stated in the comments (@schroeder), this is also about user experience. It depends also if your the recipient of the document, or the sender. And if you are the recipient, whether you have a central point where you can cleanse the document.

Ljm Dullaart
  • 2,007
  • 5
  • 11
  • It's a very good list and the best answer thus far. One question however: why is nobody, including you, not posting about formats such as djvu? Is there no other explicit document editing language? PostScript seems promising and you method of stripping a PDF of redundant things by conversion was really out of the box thinking and a good idea! – linker Mar 14 '21 at 14:16
  • 1
    DjVu is less commonly used. Therefore many recipients will not recognize this as a formatted document. Also, Microsoft has no out-of-the-box reader on Windows (Win10 does not recognize the `.djv` or `.djvu` extension and the W10 appstore has no results for DJVU;), although Google's Chrome seems to be able to display them. – Ljm Dullaart Mar 14 '21 at 14:38
  • One alternative you may want to consider is **EPUB**. There are many tools to handle it. Libre Office will export to epub and Calibre will convert PDF to EPUB. Ebook readers that handle epub are common. An epub is a ZIP file of HTML, CSS, and JavaScript so conceivably it could be weaponized as well. – user10216038 Mar 14 '21 at 16:22
  • And what about LaTex? Or ODX? Or? or? And then you want a feature-rich editor. This quickly ceases to be about security and very quickly becomes about user experience. – schroeder Mar 14 '21 at 17:10
0

A lower number of published exploits is not an indicator of a more secure codebase. Instead, I would recommend looking for a more secure PDF viewer (i.e. one written more securely and with a number of unwieldy interactive features disabled) rather than a different format altogether. If you really do want a different format but one with comparable features, I'd actually suggested HTML. It has much more functionality than mere rich text and can match the layout of most PDFs, and is far simpler than PDF (at least when it doesn't have CSS, JavaScript, or anything else like that). HTML engines' code is very well-written as it parses untrusted input by design. The format is also more strictly specified.

Unfortunately, although there may be direct alternatives to PDFs that have a simpler format which is at least in theory harder to exploit, the reality is that such formats are much more obscure and their parsers are unlikely to be written with security in mind. You might look into PDF/A which supports a subset of standard PDF features which may lend itself to a more secure implementation, perhaps?

forest
  • 65,613
  • 20
  • 208
  • 262
  • There are few, if any, PDF viewers that are designed around being secure for a multitude of reasons, one being that the format and definition of the PDF File format is diffuse. Secondly, I wrote some examples of why PDF is insecure, not that one could say "This format is more secure!" because that would be a ridiculous statement. I am asking about something that objectively has a good amount of contributors, backers, is open source, has less attack surface (smaller code base), etc. Something that is, objectively, harder to find lots of exploits for. Please reread the post and delete ur – linker Mar 14 '21 at 10:29
  • HTML is very feature rich and originally designed for a document-style. The issue with this however is the fact that HTML is a nightmare and awful by itself and has lots of branches to other code formats (JS and PHP to name two of the lot). It isn't designed to be a document-viewer format, hence using it for document viewing poses lots of design flaws. – linker Mar 14 '21 at 10:31
  • @linker JS is a separate format and wasn't even supported by the original HTML. An HTML render engine will not support JS. As for PHP, that is server-side (as are all CGI scripts) and has no bearing on its security. A pure HTML render engine is quite simple. But you could substitute any popular markup language. HTML is just the largest and with the most audited codebases. And we cannot give anything that is _objectively_ harder to find exploits for. There simply exist no drop-in alternatives that are large, open source, well audited, and feature rich. – forest Mar 14 '21 at 10:37
  • _one being that the format and definition of the PDF File format is diffuse_ – That is why I suggested PDF/A, which is better-defined. Still not ideal, though. – forest Mar 14 '21 at 10:40
  • Good thing I asked for a PDF format replacement, not a sub format of PDF. Sure if one is to use PDF one should create a PDF/A-4 format. Though, is this common knowledge? Is this something way different from PDF? Not really. Might fit in a list or a comment but the post is not directed and made for my post/question. – linker Mar 14 '21 at 11:44
  • 3
    @linker: I have the feeling that you don't really know what you are talking about. Your are mixing problems of the format with implementations - but want a different format even if the implementations are a problem. You expect something as feature rich as PDF but at the same time more secure - which kind of ignores where the problems of the format come from in the first place (complexity due to many features). You expect "open source" which means implementation but yet want a different file format which is just specification. That's not a useful base for constructive answers. – Steffen Ullrich Mar 14 '21 at 11:48
  • 2
    @linker: I've read the post and it is not clear for me. I've therefore explicitly asked in a comment what you really need as formatting but there was no reaction from you. Would something way more simple like Markdown be sufficient - unknown due to missing sufficiently precise requirements. – Steffen Ullrich Mar 14 '21 at 15:53
  • @linker stop attacking people asking for clarification. I've read your post and I have no clue what you want or what your exact specifications are. – schroeder Mar 14 '21 at 17:07