19

I'm afraid to open a PDF book. When checking the file via pdfid, I get this:

PDF Header:% PDF-1.6  
 obj 4175  
 endobj 4174  
 stream 3379  
 endstream 3379  
 xref 0  
 trailer 0  
 startxref 1  
 / Page 794  
 / Encrypt 0  
 / ObjStm 6  
 / JS 3  
 / JavaScript 0  
 / AA 6  
 / OpenAction 0  
 / AcroForm 1  
 / JBIG2Decode 0  
 / RichMedia 0  
 / Launch 0  
 / EmbeddedFile 0  
 / XFA 0  
 / Colors> 2 ^ 24 0  

I also checked the file with virustotal, where it says that the file is clean. But antiviruses do not always find what's wrong, right?

So I have these questions:

  1. Which of these (AA, ObjStm, XFA, etc) are really dangerous? Yeah, I read here about the values of these items, but still don't know how to react to them. If possible, explain with simple examples.
  2. Can I safely read the pdf after using the pdfid -d command?
  3. /JS and /JavaScript indicate that the PDF document contains JavaScript. Almost all malicious PDF documents that I’ve found in the wild contain JavaScript (to exploit a JavaScript vulnerability and/or to execute a heap spray). Of course, you can also find JavaScript in PDF documents without malicious intend.

    /AA and /OpenAction indicate an automatic action to be performed when the page/document is viewed. All malicious PDF documents with JavaScript I’ve seen in the wild had an automatic action to launch the JavaScript without user interaction.

    What differences between JS and JavaScript, AA and OpenAction if they show the same thing?

Anders
  • 65,052
  • 24
  • 180
  • 218
stackflow
  • 305
  • 1
  • 2
  • 9
  • 3
    Possible duplicate of [How to detect malicious JavaScript in a PDF file?](https://security.stackexchange.com/questions/130711/how-to-detect-malicious-javascript-in-a-pdf-file) – Conor Mancone Oct 19 '17 at 20:29

2 Answers2

12

Analyzing malicious PDF can sometimes be very tricky, attackers are becoming more and more creative in a way of infecting people.

But let's make this simple, here are some examples which will indicate that PDF is malicious.

JavaScript based exploits

The PDF specification supports JavaScript programming and makes a number of JavaScript functions available to programmers in the form of APIs.

Due to its flexibility and ease of use, JavaScript is widely used in malicious PDFs, and it is used to exploit a vulnerable JavaScript API and to setup the PDF reader program’s memory with malicious code (aka heap spray).

Non-JavaScript based exploits

Although the majority of malicious PDFs observed in the wild use JavaScript, either for the exploit or to set up the memory for further exploitation, we have observed other techniques used as well. One alternative to using JavaScript is to embed Flash objects in the PDF instead.

From PDF document: The Rise of PDF Malware

Here is also nice cheat-sheet for analyzing malicious documents.

Also take a look of 'How can I tell if a PDF file I was sent contains malware?'

Mirsad
  • 10,075
  • 8
  • 33
  • 54
  • There's the additional issue of exploiting the PDF engine itself, and this doesn't need to require JavaScript or Flash objects. You should try fuzzing a popular PDF library one day. It's an eye-opening experience. – forgetful Oct 20 '17 at 06:02
  • The part I haven't been able to find, but really want to know, is this: "what percentage of PDFs with javascript are malicious?". I realize that is probably a very difficult question to answer, but I feel it is necessary to **really** answer the OP (and now I'm curious). His question is about whether or not a PDF with javascript is malicious. You and his security expert are both saying that malicious PDFs use javascript. That part isn't really a surprise though. The question is whether or not the use of javascript automatically makes a PDF suspicious. – Conor Mancone Oct 20 '17 at 09:16
7

After a little looking it appears that the tool you are using to investigate this PDF document is standalone python(?) tool written by a "security researcher". I put that title in quotes simply because I know nothing about him, other than the fact that he claims to be a security researcher and likes putting his name on his website.

Perhaps someone who is more of a PDF expert can come by and give some better information, but from what I have seen so far it doesn't seem like his tool is actually very helpful for trying to decide if a particular PDF file contains malicious javascript. Considering that both javascript and actions are a part of the Adobe standard for PDF files, it seems crazy to assume that just because a PDF file contains javascript/actions that it might be malicious. He doesn't state that himself, but he does state the very useless qualifier that "every malicious PDF file I have seen contains javascript/actions". Here is an equally true statement: "Every malicious website I have seen contains javascript". Do I therefore disable javascript in my browser or avoid pages with javascript? Obviously not. From my perspective, the biggest problem I see is a research who perhaps doesn't understand the difference between correlation and causation.

That being said, it is possible this document contains malicious javascript. The best way to find out would be to try to extract the javascript in question and see what it actually does without running it. Since the tool in question is already parsing a PDF file, it may be possible to get that information out of said tool. Then again you might have to find another tool or attempt it yourself.

If none of those options appeal to you I would try to look at this as a risk/benefit analysis:

  1. Do you have any reason to distrust this PDF file?
  2. Did it come from a reputable source?

If it came from a reputable source and you have no reason to distrust it, I would probably just open it. If you are worried you can always try to open it in a virtual machine, or find a PDF reader that doesn't process javascript. You can also try to find a way to remove any javascript from the PDF before viewing. I imagine that this is what pdfid -d is supposed to do, but considering that I know nothing about the tool that would be something best directed to the author.

If you are on linux something as simple as:

pdf2ps input.pdf - | ps2pdf - output.pdf

may work. This will convert it from pdf to ps and back to pdf. Basically, it prints the file, which (I believe) will remove all meta information. I imagine that pdf2ps doesn't have a built in javascript library, so I think it is safe to assume that any malicious javascript will be securely removed in this process.

Then again, all of this is an "off the top of my head" answer, so your best bet is to ask another question about how to safely remove javascript from a PDF file. I'm sure that is a much more concrete (and easily answered) question then "How to know if a PDF file is infected?".

Conor Mancone
  • 30,380
  • 13
  • 92
  • 98
  • Firefox with NoScript. Block all JavaScript by default, only enable the minimum required for websites and only from verified/trusted sources. Extra scrutiny for third-party site scripts. – user1258361 Oct 20 '17 at 02:42
  • Didier *is* **the** PDF security expert. – Martin Schröder Oct 20 '17 at 05:38
  • @user1258361 The most secure data in the world is data stored only on a single USB drive which is placed in a safe, encased in 5 feet of concrete, and then sunk to the bottom of the ocean. It also happens to be the most useless data in the world. – Conor Mancone Oct 20 '17 at 09:13
  • 1
    @MartinSchröder To be fair, it wasn't my aim to cast doubts upon his abilities as a security research. However, my science and statistics background balked at his very uninformative statement of "every malicious PDF file I have seen uses actions/javascript". I'm sure it is a true statement, and I'm sure he is **the** PDF security expert, but it is a useless statement in this context. I would still love to see some statistics about the usage of actions/javascript in **non**-malicious PDFs. Otherwise, his statement is meaningless. – Conor Mancone Oct 30 '17 at 17:11