1

How many FLOPs does one AES-256 operation take in ECB mode? How many AES-256 operations in ECB mode can a maximized Cray XE6 perform per second?

Cray states that it can be scaled to over 1 million processor cores, but the brochure doesn't state how many, exactly. With 1 million processor cores, working at cabinet peak level of 12.2 to 20.2 teraflops per second, Cray XE6 can do 3.97 to 6.57 petaflops per second.

nlovric
  • 321
  • 5
  • 16
  • 2
    AES doesn't use floating point operations, but works on integer or bitwise operations. So FLOPs might not be the right unit to measure the performance of a processor needed to do AES encryption/decryption. – Paŭlo Ebermann Nov 20 '11 at 13:16
  • Yes, I know. But, computing power is measured in FLOPs, not in cycles, especially for supercomputers. – nlovric Nov 20 '11 at 18:14
  • 3
    What I'm getting at is that a processor might be optimized for floating-point operations (which is quite necessary for some applications which supercomputers are used for, like weather prognoses), but still be relatively bad for the operations used in cryptographic algorithms. So only a benchmark will tell us how many AES operations can be done, not a FLOP/s number. Also, you might get at the limit of your I/O capabilities, if you don't just want to use this to crack an AES-encrypted message. – Paŭlo Ebermann Nov 20 '11 at 18:24

3 Answers3

7

The cores in a Cray XE6 are AMD Opteron -- that's the kind of thing you find in a basic PC. You could expect that each core could compute one AES block in a bit more than 300 clock cycles; but if the cores know the AES-NI opcodes, then this can drop to 30 clock cycles per AES-256 instance. Assuming that the whole thing runs at 3 GHz, you can then hope for up to 108 AES-256 operations per second. With one million cores, that's 1014.

Note that:

  • A full CPU is something quite huge if you just need an AES implementation. In particular, the floating-point operations that Cray boasts about are totally useless for attacking the AES.

  • Most of the price of a Cray XE6 is about the interconnection of the CPU: bunches of helper bus and controllers, which are not used for AES cracking.

  • Even at 1014 operations per seconds, you would still need more time than the lifetime of the Universe to actually crack a key.

Buying a Cray XE6 to crack AES would be a huge waste of money.

Thomas Pornin
  • 322,884
  • 58
  • 787
  • 955
  • +1 for reading some greater depth into the question than I did. Indeed, that kind of interconnect work would be wholly overkill for the task. Also, I hadn't considered mentioning the AES instruction set as I was only aware of it on Intel processors... it seems AMD released the first AES instruction set Opterons only last month. – Jeff Ferland Nov 20 '11 at 19:13
  • Is it cheaper to buy 16,125 regular PCs with 4 16-core Opteron 6282 SEs - one million Opteron 6282 SE cores, buy the network equipment necessary to interconnect them, and run all of them, or buy and run one Cray XE6 with one million Opteron cores? It seems to me that the power consumption is lower for a one-million-cores Cray XE6 than for 16,125 regular PCs + network equipment. Also, a government would probably go for a Cray XE6 instead of 16,125 regular PCs, because a Cray XE6 can also be used for other tasks that cannot be efficiently performed with 16,125 interconnected regular PCs. – nlovric Nov 21 '11 at 16:29
  • @nlovric: you can ask Cray for a quote, but my bet is that the Cray will be vastly more expensive. I certainly hope that the government of the country in which I live would not buy a Cray or thousands of PC to crack AES keys because even with a billion Cray XE6 chances of finding an AES key are too ridiculously small to be even contemplated. – Thomas Pornin Nov 21 '11 at 17:21
  • @Thomas Pornin: They aren't ridiculously small. Most people use passwords up to 10 characters in length if they aren't specifically prevented from doing so. Therefore, it is useful to have such computing power available. See my calculation [here](http://pastebin.com/6i1D7Siv/). Have you ever heard of AccessDatas' Distributed Network Attack (DNA)? Apparently, it's able to break a TrueCrypt container in feasible time for computer forensics purposes. However, their technical staff was unable to provide me with exact computational cost required to break a TrueCrypt container. – nlovric Nov 21 '11 at 19:10
  • 1
    @nlovric: oh, so you are talking about password-based key derivation. So that's not about AES at all in fact. Truecrypt uses a scheme which relies on HMAC, which itself is based on a hash function (SHA-512, RIPEMD-160, or Whirlpool). No AES here (Whirlpool can claim to be vaguely related to AES, but not enough to benefit from hardware implementations of AES). For such jobs, GPU will give you a much better performance/cost ratio than any Opteron-based cluster. – Thomas Pornin Nov 21 '11 at 19:31
2

The Cray XE6 accepts Opteron 6200-series chips, all of which support the AES Instruction Set. In AES-NI Performance Analyzed (which is Intel, not AMD), Patrick Schmid and Achim Roos found that AES NI has a throughput of 3.5 cycles per byte. If we extrapolate that to the 128-bit (16-byte) AES-256 block, we get 56 cycles per AES-256 operation. The Opteron 6282 SE works at 3.1 GHz in All Turbo mode. Assuming that the Opteron 6282 SEs' AES Instruction Set has the same performance as Intel AES New Instructions, an Opteron 6282 SE core might do ~55,357,142.857143 AES-256 operations per second. Therefore, a Cray XE6 with one million Opteron 6282 SE cores might do ~55,357,142,857,143 AES-256 operations per second. The figure does not take into account neccessary I/O operations.

Therefore, to brute-force an AES-256-ECB encryption key in a known-plaintext attack, using all possible combinations, on a Cray XE6 with one million Opteron 6282 SE cores, it would take up to ~66,282,862,563,751,221,625,826,507,369,649,000,000,000,000,000,000,000,000 years to complete the known-plaintext attack. However, if the encryption key is derived from a 10-character pass phrase consisting only of English lowercase letters a-z (26 ^ 10 = 141,167,095,653,376 possible combinations), it would take that same Cray XE6 up to ~2.55 seconds to complete a non-dictionary known-plaintext attack. If the encryption key is derived from a 10-character pass phrase, possibly consisting of English lowercase letters, English uppercase letters, numbers, and 22 other characters (84 ^ 10 = 17,490,122,876,598,091,776 possible combinations), it would take that same Cray XE6 up to ~87.76 hours to complete a non-dictionary known-plaintext attack.

I calculated the duration of non-dictionary known-plaintext attacks on AES-256-ECB with one million cores @ 3.1 GHz using Intel AES New Instructions. You can see it here. I guess it would still be useful to keep something with one million cores @ 3.1 GHz with Intel AES NI around to brute force up-to-10-character pass phrases in a known-plaintext attack.

nlovric
  • 321
  • 5
  • 16
  • 1
    Note that Intel's AES-NI instructions perform one AES round in 2 clock cycles. AES-256 has 14 rounds, hence a minimum of 28 cycles per block. For actual encryption of data, you have to take care of data input and output, and also any kind of MAC (AES-NI instructions include some opcodes to help with the GCM mode, but that's not free). Hence the higher cycle count you get. But for attacks, you need not care about the MAC, and you can probably optimize things. On the other hand, a key search will need to repeatedly run the key schedule, which is slower. An actual benchmark is sorely needed. – Thomas Pornin Nov 21 '11 at 12:06
  • 1
    Thanks for the details! I'd find it easier to read if the numbers were in scientific notation, without commas. And note that you'd probably get much better cost effectiveness and power efficiency if you used GPUs to do that sort of attack. See e.g. my calculations related to the GPUs optimized for bitcoin hashing at [How to securely hash passwords? - IT Security - Stack Exchange](http://security.stackexchange.com/questions/211/how-to-securely-hash-passwords/3700#3700) – nealmcb Nov 21 '11 at 15:08
  • You want Xe+Y notation? What should the measuring unit be? I'll recalculate the entire table once I determine how the 28 clock cycles per block effect the calculation. Does anyone know the performance ratio between Intel AES NI and AMD AES IS used on Opterons 6282 SE? I know that it's more of a GPU thing nowadays, but I have no experience working with GPUs for that sort of thing. I'll read your article. – nlovric Nov 21 '11 at 15:31
  • How can I PM other users? Can anyone list the source(s) of the "28 cycles per AES-256 block" claim, because my source states 3.5 cycles per byte, which is exactly twice as much? – nlovric Nov 21 '11 at 15:52
  • I have contacted Cray to try to find out what the exact maximal number of cores for a Cray XE6 is. – nlovric Nov 21 '11 at 16:16
1

We don't know, and if you don't get your hands on it and benchmark it, we can't tell you. In theory, you could pick up a single Opteron processor and try it out on a desktop. When working on that scale, the individual machine instructions matter greatly and tailored assembly code for a heavy lifting operation may be appropriate. Time spent writing to memory, etc. will also matter.

Jeff Ferland
  • 38,170
  • 9
  • 94
  • 172