6

I get the gist of it. It's like a middle ground between brute force attack and lookup table, it stores the starting plaintext and ending hash for each chain where a chain is made by reduction and hash.

What I don't get is:

  1. It's said that rainbow tables solve collisions, but why are collisions such a big deal to begin with?
  2. It's said that rainbow tables solve collisions by using a different reduction function for each column in the chain, but how does this prevent collisions? Aren't reduction functions just random characters you take from the hash? So what difference does it make if you take the first 8 characters instead of the last 8?
Gilles 'SO- stop being evil'
  • 51,415
  • 13
  • 121
  • 180
User104163
  • 409
  • 2
  • 6
  • 11

2 Answers2

5

The current highest voted answer doesn't really seem to give a proper response to your question. I'll try to answer both your questions simultaneously.

Same reduction function in every column

Say, you use the same reduction function for every column and have a basic table with 2 rows and 3 columns.

P1  --R-->  P1'  --R-->  P1'' --R-->  P1'''
P2  --R-->  P2'  --R-->  P1'  --R-->  P1''

Here, an --R--> represents a hashing followed by a reduction. And P1, P2, P1', ... represent passwords. As you can see, there was a collision in the second chain. The value P1' has already been encountered in the first chain.

Notice what happens afterwards.

Since the hashing followed by the reduction of P1' is exactly the same as in the first chain, we get a value that has already been computed. If we continue this even further, the part of the second chain starting from P1' becomes an exact copy of the part of the first chain starting from P1'.

So in effect, this is why collisions are bad. The second chain merged into the first. We have duplicate results in our table, resulting in wasted storage space and computation time.

Different reduction function in every column

This time, let's see what happens if we use a different reduction for every column. A reduction is represented by --RX--> where X is the column number.

P1  --R1-->  P1'  --R2-->  P1'' --R3-->  P1'''
P2  --R1-->  P2'  --R2-->  P1'  --R3-->  P2''

Again, P1' was encountered in both chains. However, since the reduction functions are different, the value calculated after P1' in the second chain won't result in P1'', as it does in the first chain. This effectively solves the merge issue from the first example.

Note that this doesn't solve every chain merge though. Watch what happens in the following example:

P1  --R1-->  P1'  --R2-->  P1''  --R3-->  P1'''
P2  --R1-->  P1'  --R2-->  P1''  --R3-->  P1'''

This time a collision happens in the first column of both chains. Since it happens in the same column, every next reduction function will be the same and the chains are merged once again. The probability of this happening is lower though.

Schoolmeister
  • 66
  • 1
  • 2
1

Collisions are the only problem with Rainbow Tables. Ironically collisions are seen as a bad thing for hashing algorithms, but in the case of Rainbow Tables a hashing algorithm which generates collisions fairly regularly will be more secure.

It's said that rainbow tables solve collisions, but why are collisions such a big deal to begin with?

A given hash may be generated by multiple plaintexts (this is called a collision), which is a big problem for chains because it causes chains which start different to converge into one. Also you get loops, which are caused when a hash is reduced to a plaintext that was hashed at a previous point in the chain.

It's said that rainbow tables solve collisions by using a different reduction function for each column in the chain, but how does this prevent collisions? Aren't reduction functions just random characters you take from the hash? So what difference does it make if you take the first 8 characters instead of the last 8? The way collisions are handled is what sets Rainbow Tables apart from its predecessor which was developed in 1980.

The predecessor solved the problem of certain plaintexts never being reduced to by using many small tables. Each small table uses a different reduction function. This doesn't solve the problem completely, but it does help.

To solve chain merges and loops each chain ended at a "distinct point"; a hash which was unique in some way, eg hashes where the first 4 characters are 0. The chains keep on going until it reaches a distinct point. If two chains end up at the same distinct point then there has been a collision somewhere in the chain, and one of the chains is discarded. If a chain is generated for an unusually long time without reaching a distinct point a loop is suspected (where a chain of hashes ends up reducing and hashing to a previous hash in the chain). The problem with this is that if there is a collision there is potentially a whole branch which has to be cut off and won't make it into the chains, and a loop will cause all the hashes which came before the loop in the chain to be discarded. Also all the time spend generating that chain will be wasted, and by ending only at distinct points you have chains of variable length. This means that you may have to keep checking for a hash within especially long chains long after the other chains have ended.

I got the proper inspiration from here: http://kestas.kuliukas.com/RainbowTables/

Lucian Nitescu
  • 1,822
  • 1
  • 14
  • 29
  • I already read that link. It doesn't explain anything. – User104163 Mar 24 '16 at 16:23
  • 1
    can you describe what you did not understand? :) – Lucian Nitescu Mar 24 '16 at 16:28
  • "A given hash may be generated by multiple plaintexts (this is called a collision), which is a big problem for chains because it causes chains which start different to converge into one." This explains what a collision is, not why it's a problem. – User104163 Mar 24 '16 at 16:32
  • Also I don't think the last paragraph is about rainbow tables. – User104163 Mar 24 '16 at 16:34
  • 1
    Ok so: you have A, B, C as starting points. A starts first and has the following value (poetically example) 3123212553633474373473734734737 and B starts with 489676236 and ends with "33474373473734734737" the same "value" the C starts with 49186481624 and ends with "33474373473734734737" and you waste time generating because yo have common points with A. – Lucian Nitescu Mar 24 '16 at 16:42
  • 1
    Is there something specific you are trying to understand? – Cameron Does Things Mar 24 '16 at 16:45