3

An abstract representation of my question looks like this:

byte[] encrypted = AES.Encrypt(bytesToEncrypt, password);
byte[] hash = SHA256.Hash(password);

If an attacker obtains the values of both encrypted and hash, does that give them an advantage they wouldn't have if they only had one or the other?

Luc
  • 32,378
  • 8
  • 75
  • 137
user875234
  • 169
  • 5
  • 2
    Well if for AES you choose random key and for SHA256 you have long enough random string for password then no. – Aria Jun 09 '18 at 19:57
  • 1
    What you mean by "computing both"? Do mean side channel? Do you mean knowing of encrypted data and password hash? Do you mean also knowing the original unencrypted message? – mentallurg Jun 09 '18 at 20:17
  • 1
    @mentallurg I mean encrypting but them also hashing the password. The attacker would have the encrypted byte array and the hashed password. – user875234 Jun 09 '18 at 20:51
  • 1
    @user875234: Any additional info simplifies attacks. SHA256 is designed to be very fast. If you publish it, it gives better chance for an attack. If this does not answer yoiur real question, then change your question, so that everyone better understands what you need. – mentallurg Jun 09 '18 at 20:58
  • @mentallurg AES is also designed to be very fast. I think the speed difference is small enough to be negligible. A strong password is what you really need. – Luc Jun 10 '18 at 09:43
  • The fact that the AES encryption uses a password instead of a key, and the fact that a secure hash is used instead of a password hash indicates to me that there are things seriously wrong *within* your question. – Maarten Bodewes Jun 14 '18 at 09:58
  • @MaartenBodewes here's the implementation, if you don't mind taking a look https://github.com/smchughinfo/steganographyjr/blob/master/App/SteganographyJr/SteganographyJr.Cryptography/AES.cs – user875234 Jun 19 '18 at 00:33
  • May take some time as I an on a kayaking trip... hard to read crypto code between 2m breakers ;) – Maarten Bodewes Jun 19 '18 at 22:00

2 Answers2

3

This is safe if the password is safe. For example, 16 randomly generated bytes as password are fine, or perhaps a strong password that was run through a good key-derivation function (KDF) such as bcrypt, scrypt or argon2 (the slower the better, then an attacker can only do guesses at a slow rate).

You are giving more information to the attacker, so it is almost inevitable that you give away more information. There are two things I can think of:

  • Break one, get the other

    Your scheme is broken if either someone finds a preimage attack on SHA2, or someone finds some attack in AES that leads to key recovery.

    If an attacker only has the hash and not the encrypted contents, they can't use an attack on AES to crack your hash (and vice versa).

    Attacks that completely break AES or SHA2 are extremely unlikely, though. Both algorithms are extremely well-studied and cracking usually happens by first degrading an algorithm substantially. If either of those two are broken, we're in pretty big trouble anyway.

  • Identifying a correct password guess

    When cracking your encrypted bytes, an attacker will get random garbage as output upon each incorrect try. If your input (bytesToEncrypt) is also indistinguishable from random (for example if it's a Bitcoin transaction ID), an attacker will never know when they got the correct answer. (Well, perhaps they could check all Bitcoin transactions in history and find a matching one... but you get the idea.)

    Now, instead, they can find an input that corresponds to the hash. Once that is found, they know that the answer is right, and they can use it to decrypt the message.

    Usually, though, it's possible to check when a decryption is correct, so this is only a small advantage. And this assumes that your password can be guessed in the first place.

As said, both are very unlikely/infeasible. If your password input is secure, e.g. 16 random bytes or a very strong and KDF'd password.

So your scheme is basically safe, but it's only a small part of a larger application. If this is an important application, you should have it reviewed by a professional to be sure it's implemented correctly. (Note: that means paying someone, not just asking on a code review forum or something.)

Also, you call your inputs to AES and SHA2 "password", implying it's user-specified. A KDF is not visible in the code you posted, so you need to add that. Use a slow KDF, such as one based on bcrypt, scrypt or argon2.

Luc
  • 32,378
  • 8
  • 75
  • 137
  • Not just a preimage attack, but a preimage attack where _m = m'_ (which normally isn't relevant for a 1st preimage). That'd be much harder since a single digest can have many, many possible preimages. – forest Jun 10 '18 at 01:52
  • @forest If I understand you correctly, you mean when `H(a) == H(b) && a != b`? Because I think the author is trying to protect the encrypted bytes, so finding a collision is not an issue: only the original password will work for AES to recover the contents. – Luc Jun 10 '18 at 09:41
  • I mean that recovering the original message given only _h_ with _f(m) = h_ is not necessarily possible even with a 1st preimage attack. All it guarantees is that you can find an _m'_ such that _f(m) = f(m')_ without necessarily meaning _m = m'_. So what's the use if you can "reverse" the hash to get a full preimage, yet the message is not the original message you hashed? – forest Jun 11 '18 at 02:40
  • @forest Ah, yes. Does that have a particular name? – Luc Jun 11 '18 at 06:01
  • I don't think so. If the attack is against a keyed hash function, then it would be classified as a key recovery attack. Otherwise it's just a first preimage attack where you happen to have found the original input. This is much more likely if you are doing a dictionary attack or similar than if you exploit an actual preimage vulnerability. – forest Jun 11 '18 at 06:04
2

give the attacker an advantage

Depends. What's the attacker trying to do?

Given that you call one of the pieces of data involved “password”, I assume that the goal of the attacker is to find this password, and your goal is to keep it secret.

It's not clear what you mean by AES.Encrypt(bytesToEncrypt, password). I assume that you mean that you're using one of the standard block cipher modes of operation using AES as the block cipher, with bytesToEncrypt being the key and password being the message to encrypt, or perhaps the other way round. Below I'll write AES.Encrypt(key, message) where key is the encryption key and message is the string to encrypt, since that's the order of arguments used in virtually all APIs out there.

If you reveal SHA256.Hash(password), then the attacker can find the password by calculting SHA256.Hash(p) for many values of p until they find one with a matching hash¹. They may either do the calculations themselves, or leverage calculations done by others. This could be devastating if the hash is something like b9f195c5cc7ef6afadbfbc42892ad47d3b24c6bc94bb510c4564a90a14e8b799, less to if it's something like 3d5cfec95acbabf275d544e138fdfa02ad3945adc681dfa0cec75a127a9ff6aa, but a realistic threat for any password that's memorable.

If you reveal AES.Encrypt(key, password) to the attacker, then the attacker can know two things:

  • They will know the exact or approximate length of the password, depending on which encryption mode you used.
  • If the attacker also manages to obtain bytesToEncrypt, then they will know the password.

If you wanted to calculate AES.Encrypt(password, message), then you'd run into the problem that this doesn't actually make sense. An AES key must be a string of either 16 bytes, 24 bytes or 32 bytes. No other string length is possible: the algorithm just isn't defined for other string lengths. If you use a password that has the requisite byte length, that's a very bad idea because keys are supposed to be generated randomly. If you use a password which consists of printable characters and has patterns that make it memorable, you aren't using AES the way it was designed. You open the way to related-key attacks: it's possible to learn things from relationships between keys, such as “this key consists only of ASCII characters” (i.e. the most significant bit of each byte is 0). AES has known weaknesses against related-key attacks, which is not a problem in practice because no sane protocol involves related keys. Using a password as a key is not a sane protocol, though. In addition, you're obviously revealing something about the length of the password, and you're exposing it to the same kind of brute-force attack as the SHA-256 case if message is known.

Finding the password through its hash and finding the password through its encryption are pretty much independent. AES and SHA-256 are unrelated designs. Knowing the encryption does reveal the size and therefore makes the search for a password with a matching hash easier, but only marginally so. Knowing the hash doesn't help breaking the encryption, short of actually finding the password.

Revealing either the encryption or the SHA-256 hash of a password is a terrible idea, since each of them has glaring weaknesses (one breaks down if the key leaks, the other is vulnerable to brute force attacks). Passwords should only be stored in the form of a slow, salted hash, and the hash should be kept secret for good measure.

¹ In theory they wouldn't know whether what they found is the actual password or some different string with the same hash. However, there is no known way to find two strings with the same SHA-256 hash, and it would take a brute-force search running on all currently existing computers approximately the lifetime of the universe to have a non-negligible chance of finding one.

Gilles 'SO- stop being evil'
  • 51,415
  • 13
  • 121
  • 180
  • Regarding your two bullet points: I assumed that (in the example) the second parameter of AES.Encrypt() is the key, not the data to encrypt. If I'm not mistaken, in CTR mode the length would be as long as `bytesToEncrypt` and otherwise it's `ceil(len(bytesToEncrypt)*16)`. (And possibly +1 with certain padding.) The key should also be safe even if the plaintext is known. – Luc Jun 09 '18 at 23:15
  • @Luc Oh, I assumed that `bytesToEncrypt` was a strange name for a key (most APIs put the key before the message to encrypt), but you're right, it makes more sense that user875234 is trying to use a password as a key. – Gilles 'SO- stop being evil' Jun 09 '18 at 23:37
  • Is the link target for the 3d5cfec95... link intentional? – user Jun 10 '18 at 14:12
  • @MichaelKjörling No. I generated two random inputs but I only meant to use one of them here. – Gilles 'SO- stop being evil' Jun 11 '18 at 08:51