How is it possible to get infected with malware by opening a file on a Mac or Windows machine?

Question

Corporate security trainings keeps saying "download a file from the web or email attachment and open it and you might become infected". I know this used to be the case on old Windows machines in the 90s, but is it still the case on any computer? Obviously if you open a shell file or executable file or app that might be a problem, but at least on Macs, Apple has that warning popup.

Are they basically suggesting that there might be some exploitable holes in the software we use "regularly" (like excel or Apple numbers, or Apple preview for PDFs), and they can exploit those loopholes to install something somehow? The loophole would be unknown to the company providing the software but known to the attacker? That's the only way I can see them getting access to your computer, is there another way? I would assume in today's world, there is 0% chance of getting "infected" by opening a PDF or .xlsx or .doc file on a Mac, but is that not true?

As a bonus question, if it is still true today that opening a "normal" file might install malware, what is the recommended approach to avoiding this, assuming you want to be able to open these files (and assuming you've checked it's from reputable sources, etc.).

"I would assume in todays world, there is 0% chance of getting "infected" by opening a PDF" -- have you looked this up? Have you confirmed this assumption? Have you seen the multitude of emerging products being developed to combat this *growing* threat? — schroeder, Oct 05 '21 at 08:00
"Apple has that warning popup." -- and people like to dismiss pop-ups. There's even a name for it: "warning fatigue". — schroeder, Oct 05 '21 at 08:11
PDF is as a format notorious for being an attack vector into PCs — Hobbamok, Oct 05 '21 at 09:23
This related question might be useful: https://security.stackexchange.com/questions/97856/can-simply-decompressing-a-jpeg-image-trigger-an-exploit — Murphy, Oct 05 '21 at 12:24
Only tangentially related, but there is a class of web vulnerabilities/attacks called drive-by attacks which use the fact that a web browser downloads stuff automatically when you visit a page (html, js, images etc) and can infect a machine without the user knowingly downloading or opening the file — Jonathan Twite, Oct 05 '21 at 15:38
Even if the software designed to handle the file type were bug free -- [**the virus scanner**](https://www.cvedetails.com/vulnerability-list/vendor_id-215/Trend-Micro.html) that must open and inspect all files is a huge attack surface! And hey, it runs with maximum permissions! — Peter - Reinstate Monica, Oct 05 '21 at 17:06
"The loophole would be unknown to the company providing the software but known to the attacker?" extremely common — theonlygusti, Oct 05 '21 at 17:20
To expand on @Theonlygusti 's point - it's common enough to have it's own security exploit category: If you want to know more about why they appear, looking up more information about ["Zero Day Exploits"](https://security.stackexchange.com/questions/33314/what-is-zero-day) - sometimes denoted as 0 Day exploits. The basic premise is that the vulnerability exists, but only someone looking for the exploit to use it will catch it because of how difficult it is to find. — Alexander The 1st, Oct 06 '21 at 06:21
"Are they basically suggesting that there might be some exploitable holes in the software we use "regularly"?" Yes of course, constantly, all the time. That's the reason (supposedly) for the non-stop stream of critical security updates. The more complex the file type, the more exploitable holes the program will have. As PDF is a complex file type, PDF viewers of course have holes. But even simple Windows Notepad can be exploited: https://threatpost.com/researcher-exploits-microsofts-notepad-to-pop-a-shell/145242/ — Boann, Oct 06 '21 at 12:51
A common saying in security that may shed some light: the defender (software developer) must know and correct every vulnerability to win the security war, the attacker needs to know only about one. — user1532080, Oct 07 '21 at 02:34

score 73 · Accepted Answer · answered Oct 04 '21 at 20:20

73

Simple Instructions Over "Correct" Instructions

You may be a security expert, or at least a very knowledgable person when it comes to computers, but the vast majority of people - even those, who work with computers on a daily basis - are not. I know entirely too many people, who think computers are basically a box full of plastic and magic.

Explaining to these people which file extensions are more likely to be dangerous and which ones are less likely to be dangerous will probably lead to a lot of confusion. I assure you that a significant amount of people, who work in an office, can't tell the difference between a PDF document and a Word document, so explaining what the risk of each is is not very productive.

As such, broad statements like "Don't open files from e-mail attachments unless they are from a trusted source" are useful still, even if they are not 100% technically correct.

Which Files Are Dangerous?

Basically, all of them. Always presume that a file is dangerous, even if you can't imagine how it could possibly. Here is a list of some common file types and how they could be dangerous:

PDF Files: PDF is a complex file format and as of the time of this writing, over 1500 expoits related to PDFs exist in the CVE database.
Office Documents: One of the most prominent attacks in Office documents is macros. The general idea is that you send someone an office document, claim that it contains some important information, then create the document in such a way that it only displays the supposed information if macros are enabled. For example, you can steal NTLM hashes like that.
Spreadsheets: Also related to Office applications, you can create a malicious spreadsheet, which executes OS commands when being opened. This attack is called CSV Injection.
ZIP Files: ZIP files can be quite dangerous. For one, they can cause Denial-of-Service attacks through something like a zip bomb or place arbitrary files on a machine through zip slipping.

While there are indeed measures to mitigate some of these risks, often times these include asking the user if they want to do something risky. 9 times out of 5, they will say yes. Not because they understand that the action they're about to take is risky, but because their computer asks them so often if they want to do something and they're used to playing the little game where they have to find the button that makes the computer do what they want to do.

How to Mitigate This Risk?

There is no perfect one-size-fits-all solution. If there was, we wouldn't have to worry about malware. It depends largely on the technical expertise of who you are talking to.

When talking to an expert, I would say "Trust your gut!". Your instinct is the most advanced part of the brain, optimized over millions of years through the most brutal optimization process in existence - you do well to use it.

If you have a bad feeling with a file, don't open it. And if you have to, do it in a VM on an airgapped machine, which you completely scrub afterwards.

When talking to the average user, I would repeat the same handful of security tips you have heard a million times. Don't open files from untrustworthy sources, have an up-to-date anti-virus, etc. etc. You've heard it a million times before.

answered Oct 04 '21 at 20:20

So are there then standard libraries for scanning PDFs, Microsoft Office files, etc. for malware? That are perhaps open source :)? – Alien Oct 04 '21 at 21:02
17

@Alien [ClamAV](https://www.clamav.net/) is one way, but don't see "Didn't find anything" as "Nothing is there" – Oct 04 '21 at 21:09
4

@Alien: Also, be aware that that's not a panacea. The problem with those formats is that they are highly complex and highly arcane. The exact same reasons that make it likely that libraries for processing them may have exploitable bugs also mean that libraries *scanning* them have a hard time fully understanding them … and may have exploitable bugs themselves. Remember that "scanning for malware" is essentially equivalent to "solving the Halting Problem". – Jörg W Mittag Oct 05 '21 at 06:14
3

PDF is based on PostScript, which is a full-blown Turing-complete programming language. While PDF is somewhat more restrictive, it is still very complex. MS Office files essentially started out as barely more than memory dumps of the running program. They are so complex that even Microsoft doesn't fully understand how they work. The (not so) "new" XML-based "Open OfficeXML" file formats (`.docx`, `.xlsx`, `.pptx`, etc.) still try to cover all the functionality of the old formats, so they are also very complex: the spec is well over 7000 pages, and there are three mutually incompatible … – Jörg W Mittag Oct 05 '21 at 06:22
2

… versions. The spec is so complex that it was only discovered after it was thoroughly reviewed and standardized that it is self-contradictory and thus un-implementable if read strictly. At some point, they tried to make a backwards-compatible update to the standard, but actually made it backwards-**in**compatible in the process. – Jörg W Mittag Oct 05 '21 at 06:25
@JörgWMittag I think PDF is more like SVG with PostScript syntax than it is like PostScript. It has no programmability and there's no halting-problem obstacle to analyzing it. Not that that means it's safe. – benrg Oct 05 '21 at 06:49
4

I'd also add that there's an additional issue about "Which files are dangerous?" - the [Right-to-Left Override Attack](https://cybriant.com/what-is-a-right-to-left-override-attack/) vector is something that would allow a larger set of even completely trustable file types to become untrustworthy. i.e. "The file is 'trustworthytexe.txt' is a text file, ergo it's trustworthy." – Alexander The 1st Oct 05 '21 at 07:40
You miss one massive file type: executables. Even though you are trying to get into the perspective of an average user, you still have "the curse of knowledge". Attackers send and people run executables that are attached to emails. – schroeder Oct 05 '21 at 08:03
1

How to mitigate: browser and email client isolation. CDR. Sandboxing. There's quite a lot, actually, being actively developed as one-size-fits-all solutions. – schroeder Oct 05 '21 at 08:05
1

And I think you meant "Concise over complete explanations" as your first line. The instruction is correct., but the reasons are complicated. It's the reasons that are glossed over for the average user. – schroeder Oct 05 '21 at 08:08
29

@Nelson I think every statement you made is incorrect ... – schroeder Oct 05 '21 at 16:37
2

"they're used to playing the little game where they have to find the button that makes the computer do what they want to do" - the real largest threat in the cyber security space – TCooper Oct 05 '21 at 21:25
1

@MechMK1 While there are certainly many security issues related to PDF files, the 1500 number is a bit unfair since quite some of these CVEs are unrelated to PDF and are only found because some companies always reference PDF files with security advisories and their paths contain the string "pdf". – Marcel Krüger Oct 05 '21 at 23:15
@MarcelKrüger It's really more of a rough estimate. Even if I wrote 700 instead of 1500 - the point is that PDFs are vulnerable. – Oct 06 '21 at 10:38
@benrg I'm pretty sure JavaScript is enabled by default in Acrobat Reader... – user3067860 Oct 06 '21 at 12:39
@MechMK1 even as a rough estimate it's quite bad. I checked the first 10 links on that search and none of them were PDF related. Unless you can support the 700 number, I would remove the link and rephrase that paragraph. As it is now this part is simply incorrect. – Andrew Savinykh Oct 07 '21 at 03:09
@MechMK1 You said "but don't see "Didn't find anything" as "Nothing is there"". As I recall, there's a theorem that essentially stated that it was impossible to write a program that could detect all programs. Pretty much a statement that an anti-virus/anti-malware program could never be created such that it could detect all malware that exists. I feel like I'm butchering the theorem, but does that ring a bell? I definitely remember hearing about it in college. – Ungeheuer Oct 07 '21 at 05:08
1

@Ungeheuer https://en.wikipedia.org/wiki/Rice%27s_theorem – Martheen Oct 07 '21 at 10:11
@Ungeheuer Correct. It's always possible to write a program that exhibits a semantic characteristic, that will be misidentified as not having that characteristic. Simply put, just because ClamAV or any other AV says something is benign doesn't mean it actually is benign. – Oct 07 '21 at 12:29

score 13 · Answer 2 · answered Oct 04 '21 at 20:46

13

Just downloading a file is unlikely to be dangerous, but making any use of a downloaded file can be. Even "unused" files are routinely used without your explicit knowledge. For example, downloaded files are routinely inspected by your antivirus software, and thumbnail images may be generated from downloaded images. These uses can't be guaranteed to be 100% safe.

There was a case where a widely deployed jpg library was the attack vector - all you had to do was view the image - even though viewing images is generally considered to be safe.

Imagine the embarrassment on Microsoft's behalf if malware successfully targeted using windows defender as an attack vector. AFAIK this has never happened, but it could.

answered Oct 04 '21 at 20:46

ddyer

1,984
1
12
20

5

An example of a thumbnail generator exploit, in 2017 there was one in linux where a [thumbnail generator could execute vbcode](https://thehackernews.com/2017/07/linux-gnome-vulnerability.html) on the host computer's Wine emulator. – emptyother Oct 05 '21 at 18:52
7

It absolutely has happened - at least as far as a researcher proof of concept, if not in the wild: https://docs.microsoft.com/en-us/security-updates/SecurityAdvisories/2017/4022344 Windows Defender automatically scans email attachments _before the user even opens the email_ and that scan process had a vulnerability in it. – Chengarda Oct 06 '21 at 05:36

JimmyJames · Answer 3 · 2021-10-05T16:26:52.730

Are they basically suggesting that there might be some exploitable holes in the software we use "regularly" (like excel or Apple numbers, or Apple preview for PDFs), and they can exploit those loopholes to install something somehow?

That's probably the concern. As an example, have you ever used WinRar or heard of it? Did you know that it had a code execution vulnerability for 19 years that was just discovered in 2019? After it was made public, it was being actively exploited. Is it possible that this flaw was exploited at some time between years 2000 and 2019? It's pretty hard to prove that it wasn't. Assume a user has WinRar installed and opens a file 'foo.text.ace' which looks like foo.txt to the user because the corporate policies don't allow showing file extensions (why, I'll never understand.)

The reality is that there are likely vulnerable applications installed on every Windows machine. The question is whether anyone knows about those vulnerabilities. It's really not a good plan to wait until someone exploits it by say, installing ransomware, to worry about it. Users need to know that this is a potential risk.

And yes, Excel is a widely-known to be a great way to deliver malware. If you can get the user to enable macros it's pretty much game over.

spicy.dll · Answer 4 · 2021-10-07T14:32:15.780

Untrustworthy files can always lead to an exploit if opened, no matter the OS or program

That is because all files must be parsed by some program to be useful. Programs are written by humans, who are making mistakes no human can even recognize as a mistake yet, whereas files are made by humans and machines as well. Therefore, you can never truly trust any program to be 100% secure. MacOS is only less targeted due to its lesser market share and Apple's sandbox. However, the walls are only so high and being less targeted only gives the illusion of security (a.k.a. security theater) through obscurity (Apple uses proprietary software).

File interpreting programs that have their own scripting language like Word or Excel allow the attacker social engineer the user into running their script, which can be the equivalent of executing a program an attacker just sent you. This is the most common method ransomware operators use to gain access to your systems, apart from just buying credentials off the dark web. This usually requires some action apart from opening the file.

Other file parsing programs that don't implement scripting can still be vulnerable to undiscovered flaws in their parsers that can lead to exploits that are more difficult to find and develop but require no user input apart from opening the file in the program.

Technically, any web browser is a file interpreting program with a scripting language. Any website you visit is essentially your computer downloading, rendering, and executing several files provided by a server that can't be trusted. This is why browsers like Chrome and Firefox have to update every 2 weeks with security patches. They are heavily targeted and are constantly handling untrusted files.

The only way to 100% guarantee an unknown file won't hurt you is to not allow any program to interpret any part of the file

@TobySpeight Makes sense. I've edited my answer accordingly. Thank you. — spicy.dll, Oct 07 '21 at 14:33

score 7 · Answer 5 · answered Oct 04 '21 at 20:04

Yes, that's correct. The primary concerns are executable / script files, or malicious document files that exploit vulnerabilities in the applications that read them (e.g. Office, PDF reader, browser, etc.) in order to execute code on the victim's computer.

Being on Mac OS only really offers you security through obscurity - most people use Windows, so most malware is written for Windows. That said, remote code execution vulnerabilities are just as commonplace in Mac OS applications as they are Windows applications. Malware for Mac OS is quite common these days. There was a surge in banking trojans targeting Mac users around 5-10 years ago, and the trend has continued since - presumably attackers realised that if you're willing to drop a stack of money on a designer laptop, you've might have cash worth stealing, and since "Mac OS doesn't have malware!" is a common misconception the userbase tends to be less cautious.

The primary way to protect yourself is to install software patches and operating system updates in a timely fashion. If the vulnerability is patched, an attacker can't use it to break into your system. In addition, be wary of emails from people you don't know, especially if they have attachments. If in doubt, don't click. You should also be cautious about unexpected or unusual emails from people you do know, not only due to the potential for malware being spread via people's email contacts, but also because an attacker who compromises an email account or who registers one with a similar spelling might use it to trick you into sending them money or giving them sensitive information. A common trick is for an attacker to spoof an email from a CEO to someone in finance, asking them to buy gift cards for a client and email the codes back. It's always better to double-check directly with the person you think you're talking to (e.g. in person or over the phone) before actioning an odd request.

score 4 · Answer 6 · answered Oct 05 '21 at 11:31

There are no-click exploits that show up from time to time and are exploited before the vendor has patched it, and no amount of user training will help there as clicking "Enable scripts/editing" is not required.

There is also an entirely separate class of vulnerability that allow for code execution if you just preview the file in explorer or your email client.

Both of these vulnerabilities are rare, second one even more so, but they do occur from time to time, especially with the introduction of new software.

score 2 · Answer 7 · answered Oct 05 '21 at 22:04

2

Given the experience we have so far with software, it is rather uncommon that files aren't executable. Just because a file doesn't claim to be executable doesn't mean it isn't. Any piece of software complex enough will have bugs that cause it to execute code where no code execution was meant: it will, erroneously, interpret data files as code. This happens with such a regularity that it's not reasonable to treat "data as data". Data and code are the same thing, that's the rule. The exceptions are when things are done well enough to prevent it.

That's the reality of it.

answered Oct 05 '21 at 22:04

Kuba hasn't forgotten Monica

169
5

That's not really true. The main way to do that is buffer overflows that are impossible in JVM or .NET or pretty much any modern programming language besides C/C++ unless you specifically turn off bounds checking, trading safety for speed. – prosfilaes Oct 08 '21 at 05:22
@prosfilaes 1. Most people don't rewrite common libraries like libjpeg to run natively in CLR or JVM. 2. If one is writing a low-level file format handling library, could there be a likelihood that performance is at least a teeny-weeny-smidgen-of-a-bit important? 3. Buffer overflows aren't the only way to get some modicum of control over the victim's machine. Sometimes you don't even care about running disallowed code outright, but about leaking sand out of the sandbox using code that you can legitimately run at two privilege levels. One bit of such code may well be some JS in a PDF file :) – Kuba hasn't forgotten Monica Oct 08 '21 at 18:06
Which seems both narrowing "any piece of software complex enough" to exclude a lot of real life systems and expanding "it will, erroneously, interpret data files as code", which is a lot narrower than "some modicum of control over the victim's machine." Mel, from "The Story of Mel", derided using assembly because performance was important and an assembler impaired the optimizations he could do by hand; the idea of using a HLL like C and imagining your code was efficient... – prosfilaes Oct 09 '21 at 00:43
Concretely, The Computer Language Benchmarks Game's Spectral Norm benchmark show the C program as the fastest, with the runner-up Rust program taking 1.8 times as long. But it also shows the C program most people would write as taking 14 times as long, 7 times slower than the fastest Rust versions and slower than naive implementations in Ada, Go, Swift or Dart. Also, during this comment, I opened a PDF file instantaneously and then had to step through it slowly because the UI had no "whole words only" or "preserve search case" options. Optimization is often misspent. – prosfilaes Oct 09 '21 at 00:57

score 1 · Answer 8 · answered Oct 07 '21 at 03:21

Adding to the previous answers that mention the antivirus as an attack surface, while I'm not aware of it having ever happened, your file system is an attack surface too. And of course, the system calls used to deal with files.

In other word, saving the file on your disk could trigger something. File systems are usually very robust, but it is still possible. This article digs in a remote code exploitation vulnerability on NTFS (while not conclusively being able to exploit it).

On top of that, while unlikely again, simply the file name could trigger an attack if a software was processing it (think "PowerRename" from the MS power tools).

And finally, for the worst part, it is also not impossible that simply writing the data to your storage will trigger a vulnerability in the storage. This is a lot less likely, and would work on a much smaller range of hardware. You are unlikely to be the target of such an attack, they are extremely complicated. Here's an article dealing with dumping a disk firmware, opening the way to looking for vulnerabilities using the binary code... Note: I am myself unaware of such exploit (as in, an exploit triggered by writing data) in HDD/SSD firmwares. I'm aware of attacks on HDD firmwares, but not triggered that way.

"antivirus as an attack surface, while I'm not aware of it having ever happened" Well, then you should spend some time reading up on this whole range of attacks of which apparently you haven't yet had the joy to learn. — Nobody, Oct 07 '21 at 13:40
No, I meant I'm not aware of any successful attack on the FS (for instance, an attack on ntfs3g driver, Windows NTFS driver...). I however pasted a link to a vulnerability, reported by Microsoft, but the article concludes they couldn't exploit the vulnerability identified. — user1532080, Oct 07 '21 at 14:25
Oh, ok, then I misunderstood. I can't remember successful attacks on fs drivers either. — Nobody, Oct 07 '21 at 18:42

score 0 · Answer 9 · answered Oct 07 '21 at 18:46

I would assume in today's world, there is 0% chance of getting "infected" by opening a PDF or .xlsx or .doc file

I don't even know where to start in explaining how ludicrous that sentiment is.

Just look up "Excel VBA exploits"

There is literally an article named "MALICIOUS MACROS FOR SCRIPT KIDDIES"

With minimal effort even the most inexperienced person can craft and send you a malicious file.

How is it possible to get infected with malware by opening a file on a Mac or Windows machine?

9 Answers9

Simple Instructions Over "Correct" Instructions

Which Files Are Dangerous?

How to Mitigate This Risk?

Untrustworthy files can always lead to an exploit if opened, no matter the OS or program