80

I've always firmly held the belief that obfuscation is essentially useless. Obfuscated code is not impossible to read, only harder to read. I had the belief that a sufficiently skilled attacker would be able to bring the obfuscated code back into a more readable state.

However, OWASP recommends the usage of obfuscation for mobile clients, which makes me wonder if there is more credibility to obfuscation than I had given to it.

Hence my question: Does obfuscation give any measurable security benefit? Specifically, a benefit that outweighs the added cost, complexity and reduced performance.


Note: When I say "obfuscation", I am talking about deliberate steps taken to prevent reverse engineering. Compiler optimizations, even though they make the assembly less easy to read, are done for the purpose of improving performance, not to prevent reverse engineering.

smci
  • 203
  • 1
  • 7
  • 31
    In my experience, there's not a whole lot of measurement and empirical evidence gathering in the security world. It's mostly a lot of "well it SHOULD work like this", anecdotal experiences, and extrapolation. Personally I think code obfuscation is more about an attempt to protect business interests and code secrets than it is security. – Steve Sether Oct 09 '19 at 14:52
  • 1
    @SteveSether I thought the same way, but given that I consider OWASP a credible source, I wanted to see if perhaps my assertion was wrong. –  Oct 09 '19 at 14:53
  • 2
    One small benefit of obfuscation is information destruction. Things like spoken language, coding habits, etc. The process (ultimately code) can be re-understood, but identifiers are lost. Although, I can't think of a legitimate reason for this. Additionally, some obfuscation (e.g. Java back in the day) can introduce features in the output that cannot be easily rebuilt with the current technology/decompilers. This made Java excruciatingly cumbersome to decompile. Thus, deterring even experienced users from inferring the code. That's an obfuscation-is-better-than-its-competitors situation. – Nathan Goings Oct 09 '19 at 20:45
  • 2
    I find it fascinating that the possibility to deobfuscate depends heavily on the code style. Like how important the variable names are. The code could be written in a relatively low level language, and use very high level patterns. Like object oriented classes roughly on the level of java, all implemented partially as much as needed. Also, the use of design patterns, spanning even longer regions of code. All this structure is mostly expressed in terms of variable names. Two values of the same type could be of different classes, expressed in the name. Different classes can even be the same. – Volker Siegel Oct 10 '19 at 11:20
  • 6
    "OWASP recommends the usage of obfuscation for mobile clients" - might just be because, for instance minimizing JavaScript obfuscates it, but the real object is just to make it smaller, transferring less information and speeding up internet traffic & web page load times – Mawg says reinstate Monica Oct 10 '19 at 12:48
  • 2
    @Mawg Yet OWASP does not specifically recommend obfuscating JavaScript –  Oct 10 '19 at 12:49
  • What's this question actually for? The linked OWASP page literally gives the threat that obfuscation is supposed to help protect against and lengthy descriptions of both the threat and solutions. – Delioth Oct 10 '19 at 14:38
  • 1
    Think of it as a lock on your front door. It in no way prevents someone from busting in but if the next house over does not have a lock or it's not engaged then why would the person want to mess with your door? If they're sufficiently motivated to target you then even an electric fence won't stop them. A broken lock is also defend--able in court as it proves the person had motive to get it rather than being able to claim they were invited inside. – MonkeyZeus Oct 10 '19 at 16:12
  • @Mawg The recommendationis in a box titled "How Do I Prevent 'Reverse Engineering'?". That page is all about reverse engineering, not performance. – Barmar Oct 10 '19 at 17:35
  • I think whether it gives a security benefit is separate from whether it gives a measurable security benefit. I believe it gives a security benefit, but I do not know how to measure the benefit. – emory Oct 10 '19 at 19:00
  • True Javascript obfuscation: Translate it all to [JSFuck](http://www.jsfuck.com/). – Gloweye Oct 11 '19 at 07:49
  • 1
    @Gloweye Except that there are automated tools like [JSUnFuck](http://codertab.com/jsunfuck) that completely reverse it –  Oct 11 '19 at 07:52
  • Its the same as if you would have two same cars parked close together but one will have (visibly) alarm installed and other not. If a bad guy will come and will want to steal one of them, which one do you think he will choose? Of course, if he wants the concrete one he will takes it anyway. Its just a good practice which usually costs you nothing so do it ;) Related to JS I usually consider it as minification rather than obfsucation. And if the source can be minified to half of the original it is really good idea to do it. – Fis Oct 11 '19 at 17:12
  • _"...a benefit that outweighs the added cost, complexity and reduced performance."_ -- it's addressing the wrong side of your question, but I'll add to that (or remind that it's included in "cost", in addition to the "reduced performance" aspect) the difficulties experienced by users when obfuscation itself introduces bugs into the code. This is one of the biggest problems with technical measures intended to enforce copyright: they invariably make the user experience poorer. Legit users, that is. People who hack around and bypass the copy protection suffer no such ill effects. – Peter Duniho Oct 12 '19 at 05:13
  • I'm shocked that anyone would ever consider it necessary to *intentionally* obfuscate code! – Hot Licks Oct 12 '19 at 22:36

5 Answers5

122

There are two benefits to code obfuscation:

  1. It weeds out the shallow end of the attacker pool. Script kiddies who struggle to make sense of your code will go somewhere else.
  2. It increases effort required of skilled attackers. No matter how skilled they are, obfuscation is cheaper than de-obfuscation, and the result is generally less comprehensible than the original (variable names will remain generic, for example, where the originals were descriptive).

@SteveSether is doubly right in his comment - actual measurements will be almost impossible to find, and many code bases are obfuscated for proprietary reasons* rather than security reasons.

But for both security and proprietary reasons, code obfuscation's value is tied to its asymmetric quality - it's cheaper to obfuscate than it is to de-obfuscate.


*By "proprietary reasons" I mean "the desire to keep one's code and algorithms more private, or harder to reproduce, in the interest of maintaining competitive advantage in the market." Companies and individuals are both prone to this tendency.

gowenfawr
  • 72,355
  • 17
  • 162
  • 199
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/99892/discussion-on-answer-by-gowenfawr-does-code-obfuscation-give-any-measurable-secu). – Jeff Ferland Oct 14 '19 at 23:56
  • A nice case study for code obfuscation could be PostScript's `eexec` operator, like in https://www.linuxjournal.com/content/protect-your-postscript-files-being-converted-pdf, and eventually explained in "7.2 eexec Encryption" of [Adobe Type 1 Font Format](https://www.pdfa.org/norm-refs/Type1Fonts.pdf). So it did not work on the long-run. – U. Windl Feb 15 '23 at 23:56
12
  • For as long as I have seen obfuscated code (mostly in viruses and rootkits) on potentially everything able to receive from Internet (mail, ftp, web, dns etc., in requests, logs, file transfers), the human time involved in deobfuscating the code well enough to find essential information such as server address, admin id and the hashed password for a botnet, or sensitive strings or library calls for viruses is mostly counted in minutes.

    So in terms of protection against strange code, this is not a big job (if not trivial).

  • On the other hand, building editable sources from this kind of code could take a lot of time (to be counted in days, weeks or even more if the code is big. Anyway, the more deobfuscation processes progress, the more they are efficient and quick, as when light is coming).

  • About OWASP's recommendation, I agree: obfuscation implies human resources, so they represent some cost, making piracy less attractive.

  • About measurablility of security benefit... sorry, but I can't! Depending on who could be interested by hacking your code, which part of your code and why.

Overall, my own recommendation is: using obfuscation is not essentially a bad idea, but it's not to be considered as a big security improvement!

To be more clear: don't ever consider obfuscating code to hide secret keys/functions so that it would be more secure than if they where not obfuscated!

  • 4
    This might be true for malware, but for entire applications or games, the code becomes harder to reverse engineer. Certainly not a matter of minutes. – Hugo Oct 10 '19 at 09:08
  • 1
    @HugoZink I think he means the case that values of interest can be found unobfuscated. An IP address must be stored as value, for example. Basically `gzip -d obfuscatedfile | strings | less` – Volker Siegel Oct 10 '19 at 11:28
  • 3
    @VolkerSiegel I disagree. It can be generated algorithmically, or it can be decrypted from an encrypted string, forcing the attacker to work out the mechanisms of the decryption algorithm and where the key is stored, etc. – Jon Bentley Oct 10 '19 at 12:22
  • @HugoZink I meant the case "Certainly not a matter of minutes", that was not clear. I meant to say: There exists a method that can be done in "minutes" and often leads to results. Not objecting you. – Volker Siegel Oct 10 '19 at 13:33
  • @HugoZink I mean: *`establishing goal of obfuscated code (mostly malware)`* is *something trivial*, but build *editable source code* from big *obfuscated application* may take *`a lot of time`* (*`as opposite`*)... Sorry for my not so well english. – F. Hauri - Give Up GitHub Oct 10 '19 at 14:20
  • @VolkerSiegel I often use `smjs <(sed s/exec/print/...) >deobfuscated.raw` ... Replace `print` by `cat` or `echo` for shell, php, perl, etc... (In contained sandbox, of course;) – F. Hauri - Give Up GitHub Oct 10 '19 at 14:40
  • @JonBentley Playing with sandboxes (`qemu`, `lxc`) is very efficient. See my previous comment. – F. Hauri - Give Up GitHub Oct 10 '19 at 14:43
  • 10
    @JonBentley You can have the most wonderfully obfuscated/encrypted/generated IP address in the world, and it might take a team of 100 engineers 30 years to decipher it, yet eventually software has to supply it unencrypted to a system call like `connect()`. And those system calls are easy to log with `strace` and catch with `gdb`, at least on Linux. – Iwillnotexist Idonotexist Oct 11 '19 at 05:52
  • @IwillnotexistIdonotexist At least in the case of malware the objective is largely to hide the ip address from static analysis of the program like antivirus software might do. By the time they are calling `connect()` it provides minimal extra value to know where they connected to. In fact this could easily be observed outside of the affected machine by monitoring network packets. – trognanders Oct 11 '19 at 08:12
  • 2
    @trognanders Simple way of running obfuscated code while tracing operation with strace, gdb, tcpdump, will procure a lot of *light*. As almost all strings, address, ports, etc will become readable. – F. Hauri - Give Up GitHub Oct 11 '19 at 09:52
  • @VolkerSiegel Would most people think of searching for the value `2144929806` as an IP address? You can ping this address and get back packets from google.com. – doneal24 Oct 11 '19 at 16:48
  • @doneal24 No. Except when the context suggests it. But I was talking about a quick hack that has a chance to give useful results in a short time. And maybe I'm a bit traditional, but in my world, IP addresses are written down in dot decimal notation if it's IPv4. That's a string constant on code level, something that is often excluded from obfuscation. Certainly, you can tell the obfuscator to exclude a specific long int value. Or are you thinking of string constants representing the decimal address? (Again, all this is no really relevant) – Volker Siegel Oct 11 '19 at 17:28
  • "A few minutes" may be true if you're just looking for a hidden string, but definitely not for reverse engineering complex algorithms or software. As a famous example, Minecraft is obfuscated. It took a large team of talented reverse-engineers months to fully deobfuscate the code; and [until just a month ago](https://www.reddit.com/r/programming/comments/cznq2s/minecraft_now_releases_obfuscation_maps_for/) it took several hours of effort to [recreate the obfuscation maps after each release](https://minecraft.gamepedia.com/Programs_and_editors/Mod_Coder_Pack) – BlueRaja - Danny Pflughoeft Oct 12 '19 at 00:12
  • @BlueRaja-DannyPflughoeft I wrote: *`building editable sources from this kind of code could take a lot of time`*! – F. Hauri - Give Up GitHub Oct 12 '19 at 06:38
  • @Aaron I wrote: *`... building editable sources from this kind of code could take a lot of time...`* – F. Hauri - Give Up GitHub Oct 12 '19 at 06:39
  • My previous comment seems to have been deleted. Not sure why, as there was nothing bad about it. I'll just re-add that this answer is very misleading and drastically underestimates the effort needed to perform these activities. – Aaron Oct 14 '19 at 14:05
  • @Aaron Fair enough, Deleted comment was agressive and wrong! My answer is correct. I've already played a lot with many hacking environments. Understanding goal of obfuscated code in minutes and making editable source code in weeks or more. This is true and experienced! – F. Hauri - Give Up GitHub Oct 15 '19 at 06:50
  • @F.Hauri I won't argue the point, but I will just re-emphasize that you are claiming to do in minutes what most people cannot do in that time even given original source code to look at, let alone obfuscated assembly. This assumes non-trivial software, of course: Office Word, not Calculator. I will remain very, very skeptical, but if you are not exaggerating then good fortune to you. – Aaron Oct 16 '19 at 18:55
  • @Aaron Again I don't speak about building editable source code. But revealing destination IP, secret strings etc, yes, time as to be counted in minutes. But if I know how to do this, I'm not alone! I know many people who are able to do same. – F. Hauri - Give Up GitHub Oct 17 '19 at 05:53
9

Another point in obfuscation is that makes it harder for attackers to deny their reverse-engineering activity.

If you have a server which lets in any client that sends them a "Hello foobar" string, and someone exploits it, it may be hard to prove in court that the offender really had the intention to attack, and not just misunderstood your license agreement and assumed this was allowed. If your client authenticates with the server using an obfuscated secret key (contained within the client itself), you gain little in terms of security, but someone exploiting your server will have a hard time to prove that they got that key by chance, and not via an intentional reverse engineering effort.

Dmitry Grigoryev
  • 10,122
  • 1
  • 26
  • 56
5

Obfuscation increases the time cost of reverse engineering a program significantly. While perhaps it is quick to extract some small secrets from an obfuscated program, the work to make a non-obfuscated version of that program rivals simply rewriting it. Extracting a novel algorithm is possible but non-trivial.

Essentially obfuscated code can be reasoned about, but not reused.

Code obfuscation is the topic of considerable CS research... your axiom that obfuscation is essentially worthless would be contentious.

I would suggest the book Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection. by Christian Collberg and Nagra Jasvir.

trognanders
  • 2,985
  • 1
  • 11
  • 12
  • Sometimes even working with poorly written **source code** is bad enough that you are better off rewriting it. And sometimes understanding someone's algorithm in original source code can be non-trivial. Making it 10 or 100 times worse (or more)… OP might as well leave their home and vehicle doors unlocked and hanging wide open and their valuables in plain sight, as locks don't slow people down as much as code obfuscation. – Aaron Oct 11 '19 at 15:52
2

It increases the likelihood that, when the exploitable bugs in your software are found and exploited, it will be by highly motivated and likely well-funded attackers who specifically want to target you (or whoever is using your software) rather than skript kiddies, ransomware, etc.

For the most part, I would think you'd rather the bugs in your software be found by whitehat or grayhat researchers, with skript kiddies and ransomware as a second choice, and state-level attackers and such as the worst-case. But you need to make that call.