15

What are rainbow tables and how are they used? Gives a very precise answer about what rainbow tables are and how they are used. I had always confused hash-tables and rainbow tables. My question is regarding the size of the rainbow tables. Now, for a hash table, the size of the file would be :

let n = ( size of the input plain text file ) (Assuming one line per plain-text)

so size(hash table) = n + (bytes in hash)*h + n ( for separation) Bytes

On the other hand, is there a similar mechanism to estimate the size of a rainbow table ? I am sure there is, since tools used to generate a rainbow table usually have a size estimate on them.

How is the size of a rainbow table estimated, given:

charset, chain length, min and max length of plain text.

This would give a better understanding of how and why rainbow tables are better.

sudhacker
  • 4,300
  • 5
  • 23
  • 35

2 Answers2

9

RainbowCrack is probably what you would be using to generate rainbow tables. Rainbow tables are always generated over a keyspace, such as alpha-numeric 5-9 bytes long and the chain length and count which will affect the rate and the size of the resulting tables. If you have an input file, then it's not a rainbow table, it's some other lookup table. A rainbow table is a speical type of lookup table with neat properties. Such as the size of the hash function (sha256 vs sha512) doesn't affect the size of the rainbow table.

There are some matlab scripts floating around to calculate the size of rainbow table, however this site is easier to use.

Jonathan Cross
  • 1,618
  • 1
  • 13
  • 25
rook
  • 47,004
  • 10
  • 94
  • 182
  • 3
    Cool calculator. Note that rainbow tables are much trickier beasts compared to simple hash tables. For example, rainbow tables are probabilistic and can never ensure 100% coverage of the full key space (which is why the calculator contains a "total success rate" parameter). It's possible to increase the success rate for a given size table (by increasing the number of tables), but this increases the time it takes to search for a specific hash within the table. For details, read Oechslin's classic paper on rainbow tables [(PDF)](http://lasec.epfl.ch/pub/lasec/doc/Oech03.pdf). – David Wachtfogel Sep 11 '12 at 19:59
  • @David Wachtfogel good point. – rook Sep 11 '12 at 20:05
  • From my understanding, the rainbow table contains 2 columns, 1.plaintext(starting seed) 2.Hash(final hash). The plain text size depends on the charset that we had given initially and I assume it varies in length from 5-8 or whatever we specify. I don't understand why you say that type of Hash doesnt play a role in the size. size(SHA512) > size(SHA256) => size(2nd column) would increase. Correct ? – sudhacker Sep 12 '12 at 14:19
  • @asudhak nope, that is a lookup table, not a rainbowtable. – rook Sep 12 '12 at 16:19
4

A rainbow table is characterized by its average chain length. It is a parameter which is chosen at table construction time. Let's call it t.

If the table covers about N tentative passwords, then it will have a size of about N/t "entries", where an entry is a "chain end". The encoded size of a chain end depends on a lot of details, but it will typically be as large, or somewhat larger, than a field which can hold the integer value N. In other words, if N is close to 240, each entry will need at least 5 bytes, but probably a bit more than that, say 8 bytes.

To save on storage cost, you will want to make t as big as possible. However, you cannot make it as big as you want, because it raises other costs. Briefly stated:

  • The table construction cost is about 1.7*N evaluations of the hash function.
  • The storage cost is N/t chain ends.
  • The CPU cost when attacking is about t2 evaluations of the hash function.
  • The lookup cost when attacking is about t random accesses in the tables.

A modern hard disk allows about 100 lookups per second, so if you want to keep attack time below one minute, you cannot have t higher than 6000. You can have much higher values for t if you use SSD (which allow for many thousands of random accesses per second), but storage cost increases because SSD are quite expensive. Also, if t becomes too high, the quadratic CPU cost can become prohibitive.

The difference between a rainbow table, what existed before (Hellman's time-memory trade-off), and the modern variants which mix ideas from Hellman, Oechslin, Rivest, Biham and a few others, is a factor of at most 2 in CPU and lookup costs.

Thomas Pornin
  • 322,884
  • 58
  • 787
  • 955