547

A developer, let's call him 'Dave', insists on using home-brew scripts for password security. See Dave's proposal below.

His team spent months adopting an industry standard protocol using Bcrypt. The software and methods in that protocol are not new, and are based on tried and tested implementations that support millions of users. This protocol is a set of specifications detailing the current state of the art, software components used, and how they should be implemented. The implementation is based on a known-good implementation.

Dave argued against this protocol from day one. His reasoning was that algorithms like Bcrypt, because they are published, have greater visibility to hackers, and are more likely to be targeted for attack. He also argued that the protocol itself was too bulky and difficult to maintain, but I believe Dave's primary hangup was the fact that Bcrypt is published.

What I'm hoping to accomplish by sharing his code here, is to generate consensus on:

  1. Why home-brew is not a good idea, and
  2. What specifically is wrong with his script
/** Dave's Home-brew Hash */

// user data
$user = '';
$password = '';

// timestamp, random #
$time = date('mdYHis');
$rand = mt_rand().'\n';

// crypt
$crypt = crypt($user.$time.$rand);

// hash
function hash_it($string1, $string2) {
    $pass = md5($string1);
    $nt = substr($pass,0,8);
    $th = substr($pass,8,8);
    $ur = substr($pass,16,8);
    $ps = substr($pass,24,8);

    $hash = 'H'.sha1($string2.$ps.$ur.$nt.$th);
    return $hash
}

$hash = hash_it($password, $crypt);
tylerl
  • 82,665
  • 26
  • 149
  • 230
nallenscott
  • 4,739
  • 3
  • 13
  • 8
  • 664
    ... and now Dave's protocol is published? – Mechanical snail Dec 18 '12 at 15:14
  • 23
    Doesn't the call to `date` mean that the same password will have two different hashes at different times? – jdm Dec 18 '12 at 15:36
  • 5
    @jdm I'd guess he stores `$time` and `$rand` as the salt in the db. If not, this is clearly broken. – Polynomial Dec 18 '12 at 15:40
  • 149
    Dave might have had a point if he ended up with a better and more secure approach. He did not. The reason we use published algorithms is so that we all benefit from the collective wisdom of others and don't have to rely on our personal and evolving understanding of a concept. – schroeder Dec 18 '12 at 16:01
  • 3
    @schroeder Agreed. My first reaction was to consult the current state of the art, and build on that. I would love to own this kind of IP, but the reality is we do not have the skills, techniques, devices, manpower to 1) create the algorithm, 2) test the algorithm, and 3) maintain the algorithm ... – nallenscott Dec 18 '12 at 16:14
  • 259
    Dave has a fundamentally false premise, that the security of an algorithm relies on (even partially) its obscurity - that's not the case. The security of a hashing algorithm relies on the limits of our understanding of mathematics, and, to a lesser extent, the hardware ability to brute-force it. Once Dave accepts this reality (and it really is reality, read the Wikipedia article on hashing), it's a question of who is smarter - Dave by himself, or a large group of specialists devoted to this very particular problem. – Chris B. Behrens Dec 18 '12 at 16:33
  • 10
    @ChrisB.Behrens Otherwise known as [Kerckhoff's Principle](http://en.wikipedia.org/wiki/Kerckhoffs%27s_principle). – Polynomial Dec 18 '12 at 16:40
  • 9
    Please tell Dave not to reinvent the wheel. – Lucas Kauffman Dec 18 '12 at 14:53
  • 110
    Worst of all **his algorithm could be brute forced in a matter of minutes likely for your entire userbase.** – Ramhound Dec 18 '12 at 18:24
  • 16
    In addition to what has already been said, I'd point out that *published algorithms are designed to remain secure even though they have been published*. That means they have to be *better* than algorithms that can rely on the adversary not knowing how they work. – zwol Dec 18 '12 at 19:13
  • 11
    The issue is not that Dave is dumb; he probably isn't. The issue is that his algorithm hasn't undergone years of brutal field-testing by armies of experts. That is the **only** way to show that a hashing algorithm is secure. – Nathan Long Dec 18 '12 at 21:05
  • 247
    The issue is not that Dave is dumb. We're all, in a sense, dumb. It's that he's fallen victim to the [Dunning-Kruger effect](http://en.wikipedia.org/wiki/Dunning–Kruger_effect) and not only has no idea how dumb he is, but is *completely convinced* that he knows better than the experts in the field. This is the real danger. Dumb and aware of it is fine, and how we should all strive to be. Dumb and oblivious is bad, but is comparatively easy to remedy. Dumb and embarrassingly overconfident is both extremely dangerous and frequently impossible to fix. – Stephen Touset Dec 18 '12 at 23:19
  • 18
    "I have already researched and created a fairly standard protocol built around Bcrypt. My protocol is not new, but is based on tried and tested implementations that support millions of users." You're almost as wrong as Dave here. The devil is in the details of the implementation. Not writing your own algorithm / protocol is good, but __you shouldn't be implementing it either__. Unless you're an expert, you shouldn't be trusting password hashing to __any code you wrote yourself__. The only responsible thing to do is use a known-good implementation, not just a known-good protocol. – agf Dec 19 '12 at 16:22
  • 2
    Please keep in mind this http://security.stackexchange.com/questions/6095/xkcd-936-short-complex-password-or-long-dictionary-passphrase if you choose password. – Michał Kuliński Dec 19 '12 at 20:13
  • 7
    Please tell me you're not 'Dave' – Scott Hillson Dec 20 '12 at 19:16
  • 7
    @hillsons Haha, no. But thanks for allowing me to clear that up ;) – nallenscott Dec 20 '12 at 20:37
  • If a system is used to guard high-value targets, then the number of people trying to break it will be greater than if it isn't used to guard anything of widely-recognized significance. If the system might not actually be secure, the likelihood of it being cracked may thus be higher than the likelihood of someone bothering to crack an obscure system nobody cares about. – supercat Feb 02 '14 at 20:42
  • Allow him to listen to the "security now" podcast by Steve Gibson on twit: http://twit.tv/sn, for example show #444 https://www.grc.com/sn/sn-444.htm where they talk about "Telegram"'s implementation (about half time into the show) – Marcel Sep 05 '14 at 13:20
  • 4
    Dave talks about not wanting to use published algorithms, yet he uses MD5 (a published algorithm) on PHP (open source project) – Cole Tobin Feb 21 '15 at 06:56
  • @StephenTouset great article on the Kruger effect – DotNetRussell Jul 31 '15 at 15:11
  • 3
    Dave is dangerously bad at cryptography. Like, you know, almost everyone. That's why it's really best to use vetted standards that were developed by experts. – Scott C Wilson Nov 24 '15 at 16:29
  • 1
    @supercat That's very dangerous thinking there. Low profile targets get hacked all the time. It doesn't matter that you're website isn't a bank or an email server; hackers are looking to crack any userbase they can because they then have a list of emails and passwords that might be reused on a high-profile target – BooleanCheese Jul 02 '19 at 20:29
  • @BooleanCheese: My point was not to suggest that low-value systems are generally secure as a result of being low-value systems, but rather that they have different security risks from high-value systems. Suppose, for example, one needs to have a device that can authenticate things that plug into it without being able to connect to the Internet on a regular basis. If the device uses a certificate chain rooted with one of today's root-level certificate authorities and may have no way to check for revocation, a breach of any CA could breach the device. If it used a certificate... – supercat Jul 02 '19 at 21:05
  • ...scheme rooted to its manufacturer, it would be vulnerable to any weaknesses in that scheme, but that's a different set of risks. – supercat Jul 02 '19 at 21:07
  • I would like security even when the whole user / password database is leaked, and all your source code. Thanks. – kutschkem Jan 05 '23 at 09:14

11 Answers11

525
/** Dave's Home-brew Hash^H^H^H^H^Hkinda stupid algorithm */

// user data
$user = '';
$password = '';

// timestamp, "random" #
$time = date('mdYHis'); // known to attackers - totally pointless
// ^ also, as jdm pointed out in the comments, this changes daily. looks broken!

// different hashes for different days? huh? or is this stored as a salt?
$rand = mt_rand().'\n'; // mt_rand is not secure as a random number generator
// ^ it's even less secure if you only ask for a single 31-bit number. and why the \n?

// crypt is good if configured/salted correctly
// ... except you've used crypt on the username? WTF.
$crypt = crypt($user.$time.$rand); 

// hash
function hash_it($string1, $string2) {
    $pass = md5($string1); // why are we MD5'ing the same pass when crypt is available?
    $nt = substr($pass,0,8); // <--- BAD BAD BAD - why shuffle an MD5 hash?!?!?
    $th = substr($pass,8,8);
    $ur = substr($pass,16,8);
    $ps = substr($pass,24,8); // seriously. I have no idea. why?
    // ^ shuffling brings _zero_ extra security. it makes _zero_ sense to do this.
    // also, what's up with the variable names?

    // and now we SHA1 it with other junk too? wtf?
    $hash = 'H'.sha1($string2.$ps.$ur.$nt.$th); 
    return $hash
}

$hash = hash_it($password, $crypt); // ... please stop. it's hurting my head.

summon_cthulhu();

Dave, you are not a cryptographer. Stop it.

This home-brew method offers no real resistance against brute force attacks, and gives a false impression of "complicated" security. In reality you're doing little more than sha1(md5(pass) + salt) with a possibly-broken and overly complicated hash. You seem to be under the illusion that complicated code gives better security, but it doesn't. A strong cryptosystem is strong regardless of whether the algorithm is known to an attacker - this fact is known as Kerckhoff's principle. I realise that it's fun to re-invent the wheel and do it all your own way, but you're writing code that's going into a business-critical application, which is going to have to protect customer credentials. You have a responsibility to do it correctly.

Stick to tried and tested key derivation algorithms like PBKDF2 or bcrypt, which have undergone years of in-depth analysis and scrutiny from a wide range of professional and hobbyist cryptographers.

If you'd like a good schooling on proper password storage, check out these links:

Polynomial
  • 133,763
  • 43
  • 302
  • 380
  • 15
    Could you explain why this is 'BAD BAD BAD'? I thought repeated hashing or processing the hash inputs can help thwart rainbow table or brute force attacks. How does this introduce vulnerabilities? – jdm Dec 18 '12 at 15:35
  • 77
    @jdm The "BAD BAD BAD" part refers to the pointless shuffling of the MD5 hash, since it offers zero extra security, but in this case the "repeated" hashing is badly designed. It relies on MD5, crypt (whatever that ends up being configured to) and SHA1 all at once, but only really does 3 computations. That's nowhere near enough to defeat GPU-based bruteforce. The whole thing brings nothing but obscurity, and is probably less secure than `sha1(md5(pass+salt))` with a decent salt. It's important to use a proper key derivation algorithm (see the first link). – Polynomial Dec 18 '12 at 15:38
  • 31
    Thank you. I am equally frustrated, believe me. Dave is a great guy, and has done excellent work in the past. His reaction to all of this took me by surprise. He's missing some fundamental concepts that could be picked up with a simple Google search. I'm not trying to be vindictive or cruel, but I'm done trying to explain this to him. This is exactly the kind of feedback he needs ... – nallenscott Dec 18 '12 at 15:44
  • 54
    @nallenscott To be fair, I went through the same phase when I was learning to code - I wanted to do everything myself, and quickly fell into the trap of writing my own "encryption" and hashing. Then I actually learnt about real security, and discovered much of what I'd previously written was horribly broken. Massive wakeup call! – Polynomial Dec 18 '12 at 15:48
  • @Polynomial Absolutely. And that's why I'm here, to help expedite that process for Dave. Somewhere in the answers and comments is an epiphany that will drive him down the path of enlightenment. Yours, in particular, was a real shocker :) – nallenscott Dec 18 '12 at 16:22
  • 1
    @Polynomial - I too used to design my own crypto and hash algorithms, but that is when I was in middle school. It was definitely helpful in supplying me with the needed information to understand the "real" algorithms that I would later learn. – Casey Dec 12 '13 at 16:44
  • 75
    @Casey: offcourse, always invent your own. Make your own linked list, your own crypto, your own database, your own mergesort. Implement the algorithms and datastructures to understand them better. Then throw the implementation away - the existing are almost always better. If not, get on the project and improve them! Your peers will review your changes and give feedback. If you're wrong, you'll learn, if you're right, your changes benefit all humankind ;) – Konerak Jan 24 '14 at 09:07
  • @Polynomial Great post. Please do the line-by-line destruction of [this question](http://security.stackexchange.com/q/71934/36971), as well, if you have time. – dberm22 Jan 16 '15 at 15:46
  • 3
    I like the overriding message behind this answer but I'm sorry to say this answer seemingly **assumes** an attacker has access to the code - which is not a *given*. Could you explain why, if the attacker does **not** have the code, why this algorithm is the same as `sha1(md5(pass) + salt)`? Clearly not knowing the *process* of Dave's jumbling of the initial `md5` hash results in extra security? I know hashes/encryption *should* still be secure given the code is available to all... but in that case, why not just say the `$user` and `$password` is visible in code...? – monster Nov 24 '15 at 14:07
  • 2
    @monster See [Kerckhoff's Principle](https://en.wikipedia.org/wiki/Kerckhoffs's_principle). Never design a system where you assume that the attacker doesn't know the process. A cryptographic function should rely upon the secrecy of the key, not the secrecy of the algorithm. – Polynomial Dec 01 '15 at 16:47
  • @Polynomial Thanks for the link - I had read it before (I think it was you who posted the link in another thread I had read). Like I said, I like the answer, and perhaps I'm being a bit picky, but if we have access to the code we can easily see the `$user` and `$password` so no matter what cipher you use it's irrelevant? If you look at Dave's code you can see he is manually assigning `$user` and `$password` - if you have the code you have that! Again, probably just being picky. – monster Dec 03 '15 at 11:50
  • 4
    @monster Having access to the code is not the same as having access to the *values of variables at runtime*. That's an entirely different scenario. Just having an offline copy of the code should not affect the security of the cryptographic system. – Polynomial Dec 03 '15 at 17:22
  • @Polynomial, I agree. What I'm saying is he is assigning both variables `$user` and `$pass` in his code to ``. He is assigning the variables to a string in his code and we can clearly see the string. – monster Dec 16 '15 at 10:53
  • @monster That's a common way of denoting placeholder variables, hence why the comment says "user data". They'll be filled in with actual values from the calling code. – Polynomial Dec 16 '15 at 20:10
  • Doesn't shuffling hashes arbitrarily provide a tiny bit of security? You can't download and use precalculated hashtables on it, right? (I'm probably wrong.) – Mateen Ulhaq Feb 26 '16 at 18:41
  • 1
    @MateenUlhaq No, because if you know what the shuffle is (which you have to assume an attacker does; i.e. Kerckhoffs's principle) then you can trivially un-shuffle and crack it regardless. It should also be noted that standard cryptographic hashes are TERRIBLE for storing passwords. – Polynomial Feb 28 '16 at 23:57
  • 4
    The call to `summon_cthulhu()` at the end is actually the least dangerous line of code. Poor password protections giving a false sense of security do far more damage than an uncontrolled Great Old One. – Ti Strga Dec 20 '16 at 22:30
  • I'd like to emphasize something that was glossed over: It's perfectly fine to roll your own. Go ahead! Do it! As long as it's not in prod. – Nic May 14 '17 at 03:12
  • 1
    Just wondering, wouldn't the fact that he is using SHA1 on an MD5 hash make this just as vulnerable as an MD5 hash (that hasn't been salted) on its own? That is, an attacker would only need to find an MD5 collision for everything to fall apart right? – DividedByZero Apr 07 '18 at 15:58
  • @DividedByZero Yes, with or without salt. – Vaelus Nov 10 '18 at 01:04
  • "and why the \n?" you asked confused.... That's what obscurity is about ^^ – Zaibis Nov 12 '18 at 13:07
  • 2
    @monster A clever attacker can derive the algorithm by creating an account with a password they know and then figuring out the steps after they steal the DB. That's assuming they can't just steal the application itself and decompile it (or view it directly if it's a script), though. Attackers are infinitely more clever at deriving secrets like this than defenders want to believe. That's why defenders should apply Kerckhoff's principle: doing so keeps us from assuming the attacker is stupider or less efficient than they really are. – jpmc26 Nov 21 '18 at 21:58
251

Advantages of a public protocol:

  • Probably written by smarter people than you
  • Tested by a lot more people (probably some of them smarter than you)
  • Reviewed by a lot more people (probably some of them smarter than you), often has mathematical proof
  • Improved by a lot more people (probably some of them smarter than you)
  • At the moment just one of those thousands of people finds a flaw, a lot of people start fixing it

Disadvantages of a public protocol:

  • If indeed a flaw is found,
  • AND made public
  • AND not fixed fast enough

then attackers will start searching for vulnerable sites/applications. BUT:

  • They'll probably go after more important targets first
  • When the flaw is exposed, you know and can lock-down
  • Not all flaws are "critical" (most are like "collisions are possible under certain conditions" or "the time to bruteforce is not 1012 years, but 106 years")

Security by obscurity is deemed outright bad. Inventing your own falls under that category.

MC Emperor
  • 103
  • 5
Konerak
  • 3,938
  • 2
  • 16
  • 16
  • 13
    For well known algorithms if you're not a professional cryptographer probably should be replaced with definitely. – Dan Is Fiddling By Firelight Dec 18 '12 at 15:55
  • 76
    "Probably written by smarter people than you" - you're assuming 'Dave' admits the existence of such. – AakashM Dec 18 '12 at 16:33
  • 9
    Any password could be called "security by obscurity". But in a good system, it's been proven that the password is the **only** thing you must keep secret. An untested system almost certainly has other flaws, so that while you're protecting the password, somebody gets in using an attack you didn't expect. Which is why you want an algorithm which has been attacked brutally and found to be strong. – Nathan Long Dec 18 '12 at 21:01
  • 1
    @NathanLong: An important point about passwords/keys: if one suspects that one's secrets have been compromised, one can re-establish security easily and with confidence by e.g. rolling 25 transparent dice and using the values to generate a password or key (64+ bits of entropy); any die roll will be essentially as good as any other, provided only that the enemy doesn't find out what it was. – supercat Sep 12 '14 at 20:31
  • 7
    If by *smart* you mean people more knowledgeable about cryptography... – monster Nov 24 '15 at 14:53
172

If Dave is really "your" developer, as in you have the authority to fire him, then you have the authority to direct him to use a more well-documented scheme, and you should.

In cryptography, the fewer secrets that are required to be kept, the better. This applies especially to "hard-coded" secrets, such as the hash function itself, which are not secrets as soon as the code containing the secret leaves your building.

This is why open standards for cryptography are a good thing; if the scheme has been around for years or even decades, with the basic algorithm and various implementations publicly known, and still nobody's found a way to get in, it's a good scheme.

Case in point, Polynomial provided a very good breakdown of what's wrong with the scheme, but here are the major failings of the proposed hash algorithm:

  • MD5 is completely broken; While a preimage attack isn't made much easier by the primary ways that MD5 is broken (it's extremely vulnerable to known-plaintext strong collisions; knowing the message and the digest, one can produce a collision in 2^24 time), the hashing algorithm is way too fast to be secure against modern distributed cracking hardware. It should never be used in applications requiring data security.

  • SHA-1 is considered vulnerable; Again, there's not an elegant vector for a preimage attack, but there is a strong collision attack (2^61 time for a 160-bit hash), and the hashing algorithm is fast enough and the keyspace small enough that current hardware can feasibly brute-force it.

  • More hashes don't necessarily mean better hashing. Hashes are a deterministic process; they stand in for a "random oracle" in theoretical cryptography, but since they must always produce the same output given the same input, they cannot add any entropy to what is already inherent in the input.

    Stated simply, using a secret combination of multiple hashes as Dave is doing, even if that combination is unknown, doesn't add a significant amount of work to a brute-force (the relative order of three hashes has 6 possibilities, increasing the complexity of trying all possibilities by less than 3 powers of 2), and that extra complexity disappears as soon as an attacker can decompile your code and discover the relative order of the hash functions.

  • Passwords are inherently low-entropy. The best "user-consumable" (non-gibberish) passwords max out in complexity at about 50 bits of entropy, meaning they can be cracked, if you have some idea of what you're looking for, in 2^50 or better time. That makes passwords easier to crack than other things you'd normally hash (like certificate digests), requiring additional "proof of work" to increase their security.

  • This scheme is not adding any significant proof of work; three hashes instead of one, in the grand scheme of things, adds the equivalent of about 1.25 bits of entropy to the scheme in extra required work; you could do better by just requiring passwords to be one character longer.

BCrypt, in contrast to all of the above, has as its main advantage an extremely slow key derivation function or KDF. Blowfish, the encryption cipher that forms the basis of BCrypt, uses an "SBox"; a large array of predetermined initial values, which is modified by the key and/or IV to produce the initial state of the cipher through which the plaintext is fed. In BCrypt, this SBox setup is made more complex by taking the smaller password and salt values and running them through the "unkeyed" cipher a predetermined number of times, "warming up" the cipher into a state that theoretically could only be produced by performing the same number of iterations of setup, using the same password and salt. This "warmed-up" cipher is then fed a constant plaintext to produce the hash value. In CS terms, the hash forms a "proof of work"; a computational task that is difficult (time- and/or resource-consuming) to perform, but easy to check for correctness (does the produced hash match the one we already have?).

That "predetermined number" of iterations of SBox setup is configurable; this is BCrypt's real power. A number of "rounds" of key setup are specified when hashing, and for n rounds, 2^n iterations of key setup are performed. That means that incrementing the number of rounds makes the hash perform twice as much work and take twice as long on the same hardware to produce the requisite proof of work. BCrypt therefore can easily keep up with Moore's Law, which has been the bane of many hash functions that came before.

BCrypt's main theoretical weakness is its low memory usage; while computation time can be exponentially increased, the amount of memory required remains effectively constant (the SBox doesn't get any bigger as the iterations increase and no "intermediate results" are required to be kept around). As such, while BCrypt stymies the pure pipelined computational power of a GPU, it remains vulnerable to cracking by more sophisticated "field-programmable gate arrays" (basically a massive, highly modular set of "soft-wired" logic units, that are the current state-of-the-art in highly distributed computing), because the low memory constraints mean you can just throw more relatively-inexpensive circuit boards at the problem.

A newer algorithm, SCrypt, combats FPGA crackers by exponentially increasing the memory demands as well as the computational expense of calculating the hash, so the limited memory available to each FPGA quickly makes them infeasible as well (distributed cracking would basically require large numbers of CPU/FSB/RAM combinations, essentially full computers, hooked together). The only problem there is that SCrypt is only 2 or 3 years old, so it doesn't have the cryptanalytic pedigree of BCrypt (13 years old). And really, if you have to worry about an FPGA-armed cracker getting a password in a feasible amount of time (like before you can change all of them anyway), you have pissed off some very powerful people, and have already exposed another serious vulnerability (allowing an attacker to obtain your stored password hashes in the first place, so they can crack one "offline").

Bottom line, use BCrypt for password hashing. It's 13 years old, based on a 20-year-old cipher algorithm, an in that time no known vulnerabilities have been found that would aid a password cracker. It's slow as molasses, and can be configured to always be so, provided you combine it with a requirement for users to change their password every 90 days or so, so new passwords keep getting hashed with more expensive configurations.

KeithS
  • 6,758
  • 1
  • 22
  • 39
  • 23
    First, SHA1 is 160 bits. Second, being broken for collision attacks (2^(n/2) to brute force) doesn't matter the slightest in the domain of cracking a hashed password (where you worry about (first) pre-image attacks - 2^n to brute force); see: http://en.wikipedia.org/wiki/Preimage_attack . For crypto-signatures a collision broken hash (and SHA1 is broken in 2^61 time) can't be used (as signed messages could be altered). (Though the simplicity and speed of MD5 and the existence of large rainbow tables does mean its bad for passwords.) Agree with your arguments for bcrypt though. – dr jimbob Dec 18 '12 at 17:49
  • 1
    Deleted long discussion on preimage attacks, collisions, hashing strength etc – Rory Alsop Dec 19 '12 at 08:42
  • For more on collision attacks vs pre-image attacks inspired by the old form of this question, see the [second half of my answer](http://security.stackexchange.com/a/25598/2568). – dr jimbob Dec 19 '12 at 18:27
  • 12
    I use MD5 to create nice colours in my charting scripts - not for securing passwords – mplungjan Mar 19 '13 at 07:19
  • Perhaps the most optimal would be to run your hash through both SCrypt *and* BCrypt and then you get the advantages of both, and vulnerabilities in one of them will not trivialize your hash. – Muhd Dec 12 '13 at 20:11
  • @Muhd - The problem with your supposition is that both algorithms are designed to add a nontrivial but acceptable amount of time complexity to the calculation (and the amount of complexity can be varied to achieve this balance). Running them in series means that you either double the time a user must wait for a legitimate password verification, or the strength of each algorithm must be reduced by half. Complexity is the enemy of security; the more it interferes with the status quo, the more likely your system will be "attacked" by your own users. – KeithS Dec 16 '13 at 23:48
  • @KeithS Using the same amount of time, and presuming there is no attacker with an exploit to either algorithm, you will have greater security using half the time for SCrypt than for using all the time for BCrypt. SCrypt is orders of magnitude more powerful, provided there is no vulnerability. Also, "complexity" is not the enemy of security as long as the people building/maintaining the system can understand it, though it may be the enemy of convenience. And there is no inconvenience to the end user with password hashing as it is all behind the scenes. – Muhd Dec 17 '13 at 00:56
  • 4
    @mplungjan Nice idea! – Mark K Cowan Nov 12 '18 at 13:36
  • 3
    Oh my God I hate companies so much when they require resetting a password every 90 days, it's so frustrating. Do not do this. Your company is not special, unless you're Facebook or Google, it's not even likely that you users visit your page that often, you're asking your users to reset their password nearly every single time they visit your website, which is absurd. – Nicholas Pipitone Nov 13 '19 at 07:48
104

To be fair to Dave, in terms of homebrew password security this is one of the better cases as all it just a little obsfuscation (and really not much) masking hash = SHA1(salt + MD5'(Password)) where MD5' does a reversible swap of the order of the bytes of the MD5 hash. Now the username/time/random/crypt-part is just used to generate a salt, and the only requirement we have of salts is that they just need to have a very good likelihood of uniqueness; so while its over-complicated there's really no use talking about it.

Again hash = sha1($salt+md5'($password)). Rearranging the MD5 adds no security (swap(00112233445566778899aabbccddeeff) becomes ccddeeff8899aabb0011223344556677) the swap does not prevent you from using md5 rainbow tables after (e.g., look at the code and reverse the swap). However, the presence of the unique salt makes the rainbow tables infeasible.

Now to be critical, this boils down to a simple cryptographic hash function applied twice; which is better than storing plaintext password (see plaintext offenders ), and better than storing an unsalted hash (see linkedin leak ). However, in the age of cheap massively-parallel GPUs, this is too weak for modern use. Anyone with a bit of general-purpose GPU programming experience that got on the live server somehow (to obtain the hashes with their salt) can likely see his source code, try out their own password as a test case and then can brute-force any particular password at a rate of billions of attempts per second per GPU.

So if any user is using a password on a list of a million or so previously leaked password (say from linkedin), the attacker can near instantaneously crack it. If some user's password is a random 8-letter from the char set A-Za-z0-9; it would take about 60 hours on average to break per GPU (so if you had 60 GPUs; would take 1 hour). Using common cracking techniques exploiting forms of common passwords could speed it up dramatically. Its also worth noting that since $password passes through a 128-bit MD5 hashing function, there is absolutely no benefit in using more than 128-bits of entropy in the passphrase (though to be fair that's a very secure password; like a 10 word diceware passphrase or random 22 character alphanumeric password).

Really, you should be using iterated cryptographic hash functions; that is something like bcrypt or PBKDF that slows down the brute-force rate of attackers by a large constant factor (say 105) (so instead of 60 hours to crack a random 8-char password from a 62-char set (A-Za-z0-9); it takes 6 million hours (~700 years to likely be broken) with a single GPU, and this gets better with stronger passwords (e.g., a 10 char password would take ~3 million years; so even with a million GPUs it would take 3 years). So a little keystrengthening moves relatively weak password (8 random chars from 62-charset) out of the range of feasibility of being cracked by an attacker. For more on the upper limits of using a simple keystrengthened password see this answer.

Collision Attacks vs Pre-Image Attacks (or Why Collision Attacks on a hash function are irrelevant for password hashing)

KeithS's answer, while it gives good advice (use bcrypt and not a simple cryptographic hash function to hash your password) originally criticized MD5 and SHA1 for the wrong rationale (don't use MD5 as its vulnerable to collision attacks). There's a subtle difference between pre-image and collision attacks and the arguing in the comments was too condensed, so I am elaborating here. I highly suggest reading the wikipedia article on pre-image attacks.

A pre-image attack says given a specific 128-bit hash written in hexadecimal: h=ad2baf26a87795b3c8a8366a08b44112, a specific hash function H, please find any message m such that h=H(m); note that there are N=2128 different distinct hashes. Now if your cryptographic hash function is not broken, each bit in your hash has a 50% chance of being 0 or 1 for a random message m. Then I will need on average to generate hashes for roughly N=2128 hashes before I am lucky enough to find any message m such that h=H(m) (this message may not be the same input that was used to originally generate the hash -- but this still falls under the category of 'pre-image' attack).

A collision attack says find me any two messages m1 and m2 such that H(m1) = H(m2). Note that m1, m2, (and H(m1)) are all free to change. This case is a much easier problem since if I generate M hashes for M different messages, I am not just comparing the M messages to one specific hash (so have M chances of a finding a collision), now I have M*(M-1)/2 pairs of hashes, so roughly M^2 chances of having a collision. So in this case, I will need roughly to generate roughly sqrt(N)~264 hashes before its likely that one of them will collide with another one on an ideal 128-bit hash.

Let's look at the two types of birthday problems. The collision problem translates into the common birthday "paradox"; how many people do you need in a room before its quite that two people share a birthday with N=365 days in a year. The answer is a paradox as you only need about sqrt(N) ~ 23 people before its likely that two people share a birthday (as with 23 people in a room you have 253 different pairs of people that could match). (I do know that sqrt(365) != 23: I am using approximate math not focusing on insignificant constant factors. Re-doing the calculation with sqrt(365) ~ 19 people in the room then P(two share birthday) = 19! * comb(365,19)/365**n = 37.9%, which while not strictly 50% still means its fairly likely to happen). Note for the collision birthday problem, you can't have N+1=366 people in a room and there be a chance of not having a collision (ignoring leap days); at best the first 365 people had different birthdays and the last person generated a collision.

The pre-image problem is a very different question, how many people do I need in a room before it is quite likely that someone has one specific birthday (say B=December 18th). In this case, I need roughly N ~ 365 people before its likely to happen. E.g., with 365 people in the room you have a P(somebody has birthday B) = 1 - (1 - 1/365)^(# people) so for # people = 365 you have a 63% chance that someone's birthday will be some fixed date B. In this case, you can easily imagine having any number of people being in a room and there being no one who has a birthday on one specific date. (Say you only invited people into the room if there birthday was not a given date; there's no limit to the number of people you could invite).

When a hash function like MD5/SHA1 is broken for collision attacks that means you can generate a collision in less work than the brute force time of sqrt(N) ~ 2numbits/2. For MD5 it only takes about ~ 2^24 time to generate a collision; for SHA1 it takes ~ 2^61 time. That means collision attacks on MD5 are extremely easy to make; but practical attacks on SHA1 are still difficult. But collision attacks only matter if you don't care what hash you are trying to match. These collision attacks are quite relevant for some applications like cryptographically signing messages to ensure message integrity, so beware of using MD5/SHA1 in those cases. However, when you have a unique salt, and a specific hash you are trying to match to authenticate, collision attacks do not matter.

dr jimbob
  • 38,936
  • 8
  • 92
  • 162
69

enter image description here

See the related Security Meme post

While this may seem very simplistic, the rules hold true - designing crypto algorithms and implementing them correctly/securely is very hard. Even the ones designed by experts and picked at by thousands of people over years have holes discovered in them eventually.

So Do Not Roll Your Own Crypto is good advice for everyone (possible exceptions include notables like Rivest, Adleman, Pornin, Shamir)

Rory Alsop
  • 61,474
  • 12
  • 117
  • 321
  • 4
    Funny and agree with the sentiment, but its irrelevant as he didn't write his own cipher. He took existing crypto-hash functions `MD5` and `SHA1` along with custom permutation function `dumb_perm` (`dumb_perm('00112233445566778899aabbccddeeff')` goes to `'ccddeeff8899aabb0011223344556677'`), so `hash = SHA1(salt++dumb_perm(MD5(pw)))` and created their salt in an overly complicated manner. While they've increased their maintenance costs for no gain in security, they are not creating their own cipher--the flaw is that simple hashes are too quick nowadays, so key-strengthening is necessary. – dr jimbob Dec 19 '12 at 16:41
  • I think the greater problem is that the question's example is more of a case of key weakening than it is of key strengthening. It may not be a new cipher, but it's falling for the same pitfalls. – Jeff Ferland Dec 19 '12 at 18:08
  • 1
    @drjimbob cipher != hash – bradley.ayers Dec 19 '12 at 21:09
  • @bradley.ayers - I understand the difference (ciphers are used for encryption; hashing is not encryption as its not reversible.) The meme image first used the word "cipher", and I didn't criticize Rory as cryptographic hash function like MD5 and SHA1 are based on algorithms similar to block ciphers: http://en.wikipedia.org/wiki/Cryptographic_hash_function#Hash_functions_based_on_block_ciphers However, Dave didn't create his own cipher/hash functions or anything similar. He merely did a dumb permutation of an MD5 at one step of a weak hashing scheme (`sha1(salt+md5(pw)`). – dr jimbob Dec 19 '12 at 21:17
  • 3
    _One does not simply write one's own cipher_ sounds better – mplungjan Mar 19 '13 at 07:21
  • It's worth pointing out that Rivest and friends didn't roll their own crypto. They devoted significant chunks of their life rolling crypto for everyone else. Personally, I would not put it past a large firm with billions to protect to hire some professional crypto gurus to roll crypto (why does that sound like a drug term) for use within the company. – corsiKa Oct 08 '13 at 21:58
44

While we can find plenty of flaws with Dave's algorithm, it really isn't horrible because it isn't 100% home brew; he does use hashing protocols that (albeit weak) are based on solid principles. On the other hand, he takes steps that increase complexity for the developer but do little to improve the security of his algorithm.

But the reason I am adding another answer here is to bring up the age-old argument that there is value in obscurity. The primary line of defense should always be strong, widely-accepted hashing algorithms but there is no harm in adding even the tiniest amount of obscurity as an additional layer of defense.

That layer of obscurity should never be considered a defense in itself, simply an additional factor that reduces the chances of compromise. Since a successful attack normally requires a number of factors to fall in place, even small defenses can have a great impact in reducing your overall risk.

I can't tell you how many times I have seen hash dumps that I have tried to crack where I simply get nothing because some Dave somewhere thought of this dumb non-standard algorithm that I can't crack short of doing an intense cryptoanalysis or breaking in to the web servers to get the algorithm. You can't deny that these dumb algorithms are a valid defensive factor.

Now if Dave took his dumb algorithm and added it as a layer to stronger algorithms such as PBKDF2 or bcrypt, then he has obscurity with the backing of solid security practices. Also, obscurity doesn't need to be as complex as Dave's code, all you really need is one bit to be different to be obscure. Remember this is not a defense, just a layer of obscurity.

Better ways to accomplish obscurity are using an HMAC, concatenating a server-side salt constant, and/or using a large, non-standard number of iterations. None of these measures will stand on their own but they certainly do address specific attack vectors that contribute to your overall risk.

tl:dr; security through obscurity is bad, but solid security plus a reasonable amount of obscurity is a valid strategy.

Mark Burnett
  • 2,810
  • 13
  • 16
  • 3
    Indeed Dave's algorithm is essentially salted HMAC with a little twist. It appears to be as strong as HMAC. For example attacker won't be able to use someone else's rainbow table. My biggest concern with this system is where salt is stored. –  Dec 20 '12 at 09:50
  • 2
    @qarma HMAC is using the secret (i.e. the password) twice, this algorithm is using it only once. – Paŭlo Ebermann Apr 06 '13 at 13:43
30

OK, fire Dave. At the very least hit him with a very large clue-bat. Open protocols are good because anyone can look and attempt to find vulnerabilities and structural problems, and implement fixes. The visibility improves the protocol. Good security means that everyone can know how the system works and it is still secure.

GdD
  • 17,321
  • 2
  • 41
  • 63
  • 1
    Do you still belief that in all open systems all holes are found and fixed and therefore they are the most secure systems, when considering the leaks of the last few years that have revealed that the likes of NSA and CIA keep vulnerabilities to themselves and develop exploits off them for various barely legal reasons? If they can do it (the so-called 'good guys'), can't the so-called 'bad-guys' do it too ('Chinese/Russian hackers' and so on)? – a20 May 14 '17 at 05:27
  • You are confusing design with implementation @a20, in any case I don't think being closed helps, most of the vulnerabilities that the US gov't are sitting on are in closed systems, for instance MS Operating Systems. – GdD May 15 '17 at 06:59
  • "most of the vulnerabilities that the US gov't are sitting on are in closed systems" .. fair point. What did you mean that I'm confusing design with implementation? – a20 May 16 '17 at 08:01
  • @a20 I'm really late to this party, but I'm guessing that GdD was referring to the fact that those real-world vulnerabilities you're talking about are rarely fundamental flaws in a cryptographical standard, more often they're flaws in the way a software product uses or implements the standard. For example, if I write an implementation of an open protocol that is vulnerable because I used a timestamp instead of a crypto-PRNG as a seed, that vulnerability is evidence of a weakness in my implementation, not in the open protocol necessarily. And closed systems are arguably more susceptible here. – David Schwartz Jul 01 '20 at 23:22
19

Convince him with good reasoning. Don't berate him.

You have to think about why we are hashing passwords: The reason is to protect the original password by making the hashing process take a lot of CPU time to execute. Brute force is the way the passwords are typically recovered.

If the attacker is able to steal your password database then they've managed to access your database. If they have access to the database then they probably have access to the code that produces these passwords as well. After reviewing the code they can start brute-forcing the passwords because the hashing algorithm takes too little time to execute.

The idea behind well-known hashing methods and practices like using PBKDF2 with 10,000 iterations is that they take a lot of time to execute on current hardware.

If you have some time on your hands you could also write a brute-forcing program and show him how many attacks per second you can execute with his algorithm vs prevailing standards.

Sarel Botha
  • 1,155
  • 7
  • 8
  • 5
    Just because they can access the DB, does not mean they have access to his code. – jmoreno Dec 18 '12 at 16:48
  • 1
    @jmoreno Still doesn't justify his useless scheme. – Thomas Dec 18 '12 at 20:39
  • 2
    @Thomas: true, and the reason for rejecting bcrypt is simply unsupportable, but that doesn't mean that your code can't add (or remove) value from using the standard libraries and practicies. – jmoreno Dec 18 '12 at 22:02
  • 1
    @jmoreno In general, security by obscurity is perceived as a crutch for poor cryptography practices, and it often is. That doesn't mean it doesn't have value - it does have its uses, but in this situation it is not warranted. – Thomas Dec 18 '12 at 22:13
  • @jmoreno, right, which is why I said 'probably'. The main point is that the hashing must take time. – Sarel Botha Dec 19 '12 at 14:56
11

I've written several hashing algorithms. There's nothing wrong with it, if you know what you're doing. In a very slight way, he's right about the fact that tried-and-true algorithms may be a little more vulnerable to attack. So if you can create a good algorithm, you're golden. The only problem is that his totally sucks, for a number of reasons.

  1. // timestamp, random # $time = date('mdYHis');

    I don't even feel like I need to explain this one. He was trying to create a seeded "random" number when it doesn't make any sense to do so. The hashes will change day to day. I cannot imagine this working in any reasonable scenario.

  2. He says that he want's to stray away from known hashing algorithms, yet he uses MD5 (the known-est of them all, and a fairly weak one too) for the bulk of "his" algorithm. His method is only a very slight derivative of plain ole MD5 hashing. Which brings me to:
  3. He's shuffling his MD5'd result. Uh, why? As a fun little activity, try asking him to explain his reasoning there. And I'll give you a tip to help you out: Whatever his answer is -- it's wrong. Shuffling a hashed value is like shuffling a deck of all-blank cards. There is no extra benefit from it whatsoever.

There are more things wrong with it, too, but these are the big ones. Tell him to get back to his usual work, and just use bcrypt (or scrypt, which is probably even a little more secure to do the fact that it's really CPU intensive)

Phillip Schmidt
  • 229
  • 1
  • 4
  • 3
    Yes you can write your own, but most people who think they are qualified to write their own are not qualified to write their own. It's easy to judge how smart we are, but difficult to judge how dumb we are. – Mark Burnett Dec 21 '12 at 19:44
  • 2
    I think the reason for the shuffling is to make the algorithm different from anything an attacker (who doesn't know the algorithm) expects. Using a application-specific secret salt input (aka "pepper") has this same effect and is more reliable. – Paŭlo Ebermann Dec 22 '12 at 02:04
  • Regarding #3: A shuffled MD5 hash cannot be looked up in a rainbow table. – Douglas Held Jul 03 '19 at 21:09
10

Steganography would benefit from Dave's operational secrecy better than Cryptography. This might be what Dave was intuitively aiming for.

Cryptography

Each cryptographic solution benefits from widest possible exposure because relying on a secret other than the private key makes operational security for a crypto-consumer much harder. A key is a single easy-to-manage, portable, predictable, well-defined security risk that crypto-experts have focused great deal of effort on ensuring that all risk in the algorithm has been squeezed into only the key.

Keys can be made unguessable (high-entropy) as to require brute-force or side-channel attacks. Secrets in other aspects of an algorithm can not be made as unguessable because any other aspect of an encryption algorithm be generalised into a universal class of (possibly weaker) encryption algorithm + a constant value. This constant value representing all residual invariant entropy in the algorithm's structure or obfuscation. This value is never particularly large or entropic in a home-brew algorithm and is inherently constant in any non-self-modifying computer program. An attacker can either acquire the source and remove this (minor) entropy directly or run the ciphertext through a generalised class of algorithm. NSA for example would be doing precisely this for unknown variants of common algorithms in use by foreign powers.

Steganography

Each steganographic solution benefits from the least public exposure because hiding the location of secrets requires the attacker to customise their search in a fashion potentially unique for that victim; and scales to the amount possible hiding places and methods of hiding at the victim's disposal.

This could be as simple as having a fake hash password column and hiding the real one in four (64bit) time_t columns across various tables that record user information. This can be as complex as hiding a Portable Linux emulation inside a BluRay movie.

Steganography ultimately relies on creating or modifying an artefact that has a public semantic meaning or purpose and secondary private semantic meaning or purpose. An person or organisation benefits from hiding secondary semantic meanings in much the same way as any lie by omission. Be that "My sock is boring and doesn't contain diamonds" or "These are the hashes you are looking for, ignore that honeypot or that socket connection to an audit server".

In Combination

So Dave should use public vetted high-level libraries like BouncyCastle instead of a home-brew of low level crypto-primatives; but should feel free to make the hacker navigating the server ecology feel like they are reading a brainfuck program that's missing some characters.

Assuming full support documentation is kept in a secure offline location for Dave's successors...

LateralFractal
  • 5,173
  • 18
  • 41
8

I'm going to try a different approach to answering this. I'm not going to tell you something is bad, and not explain why. I'll explain exactly why it's bad, step-by-step, and show you how to break it, but I won't go too deeply into Rainbow Tables... it's assumed you know what that is.

Here's why your developer's hash is incorrect: because it just introduces a few factors that could be utterly brute-forced once the method is discovered, even if he were to randomize the shuffling.

How? Because if I have access to your server, I also know the method which your developer is using to hash your password because I've stolen that code too, and I can rearrange everything to suit my little crackpot application.

Let's say I hacked your company, and gained access to your database so that I can obtain these hashes. I know WHEN users register, correct? I know when they last changed their password because, unless you're a complete newbie, you should be logging that information to the database.

$time = date('mdYHis');

This timestamp is probably around the same time the the user registered his or her password, or changed it. We can know this because the last time it was updated should be stored in the database. The time you registered should be stored in the database. Part of the crypt has been defeated.

$rand = mt_rand().'\n';

Cool, a random number generator. You know, nothing is really random on computers. Random number generators aren't truly random. This is the Mersenne Twister. All I need are 624 different combinations to figure out all future numbers. I've stolen your "salt" from the database as well, and I know how you're calculating it.

Because I know the dates on which your users registered, I know how to recreate the correct number for the Mersenne Twister routine at that specific time. I can write a little cracking method to extract the exact number that would likely happen at that time. Your salt is no longer random. But you know what? It's irrelevant. I know the exact format you're using for that salt, and I likely got it from your database anyway. Then I can use Rainbow Tables to utterly annihilate your salt in under a second because mt_rand() alone produces a maximum of 10^10 hashes, and a minimum of 10^9, which means 11 billion possibilities, and since all of them are numbers, and I know the format of your salt, they will be cracked in less than a second.

So now I know who this username belongs to because I jacked your database. I know when they registered, I know when they last changed their password, and I know your random number. I see you, Dave, a thief on the roof. My new satellite link has both infrared and the x-ray spectrum. I see your heart beating. I see you are afraid.

Now that $crypt has been defeated, let's take a look at function hash_it().

function hash_it($string1, $string2) {
    $pass = md5($string1);
    $nt = substr($pass,0,8);
    $th = substr($pass,8,8);
    $ur = substr($pass,16,8);
    $ps = substr($pass,24,8);

    $hash = 'H'.sha1($string2.$ps.$ur.$nt.$th);
    return $hash
}

Now let's take a look at some sample input: hash_it(Dave, testpassword1);

Results:

$pass = "b7e055c6165da55c3e12c49ae5207455"
$nt = "b7e055c6";
$th = null;
$ur = "3e12c49a";
$ps = "e5207455";

$hash Input: "H" + "username + "RegistrationDate" + "1604716014" + "e5207455" + "3e12c49a" + "b7e055c6"(tldr: "HDaveRegistrationDate1604716014e52074553e12c49ab7e055c6")

$hash = 'H'.sha1("DaveRegistrationDate1604716014e52074553e12c49ab7e055c6);

`$hash output: "6b347a1521b6b84501806268614abe1e7324c703"; (same thing every time)

Here's how an attack might work:

  1. Break Mersenne Twister (mt_rand()) after 624 observations, or just Rainbow Table the salt.
  2. Grab Dave's REGISTRATION_DATE from the database, or the USER_LAST_MODIFIED_DATE
  3. Grab Dave's SHA1 hash
  4. Initiate brute force attack (it becomes a static sha1 hash after we've broken your "security") with the following parameters:
    • "Username" + "RegistrationDate" + YourRandomNumber + DictionaryAttack[]
  5. Harm your customers.

And that's the story of why you shouldn't roll your own cryptography unless you really know what you're doing. Imagine if I were an actual cracker, and not someone who just went backwards through your code.

Mark Buffalo
  • 22,508
  • 8
  • 74
  • 91
  • 1
    And what if you don't have access to the code as you presume? – monster Nov 24 '15 at 09:51
  • 1
    @monster Of course I have access to the code. It's PHP, and he's trying to protect his database, which I've already stolen. The point of hashing passwords is to protect your users once you've been breached. If I can get a hold of your database, getting a hold of the rest of the important files on the web server is a trivial task. – Mark Buffalo Nov 24 '15 at 11:56
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/32054/discussion-between-mark-hulkalo-and-monster). – Mark Buffalo Nov 24 '15 at 15:21
  • 2
    @MarkBuffalo `Of course I have access to the code. It's PHP, and he's trying to protect his database` There are many situations where SQLi may allow for dumping the database without obtaining the PHP files. Of course, that also means that Dave should just be using a pepper, since it accomplishes the same threat model but isn't so... broken. – forest Dec 12 '17 at 06:51