85

I have 2FA setup on my bank account. When I login, I receive a six-digit code as an IM on my phone that I enter into the website. These codes always seem to have a pattern to them. Either something like 111xxx, 123321, xx1212, etc.

I'm thinking that these codes are intentionally easy to remember at a single glance. Is there a common business practice/best practice that dictates these codes have a pattern to them to make them easier to remember?

Bob Kaufman
  • 891
  • 1
  • 6
  • 7
  • 31
    I use a lot of 2FA codes and I have never noticed such a pattern. Sure, there are repeated digits sometimes, but they don't seem to occur often enough to suggest that something strange is going on. It might be instructive to keep track of your codes and do a statistical analysis on the digits once you've accumulated a hundred or so. – David Z Dec 11 '17 at 23:07
  • 8
    I use both texted 2FA and also authenticator app codes and while I notice no pattern in the authenticator app codes, I have noticed ones texted to my phone are often easily memorable. – Toby Smith Dec 12 '17 at 01:31
  • 14
    As a side note (this being the infosec Stack), factors sent as SMS messages are not considered secure second factors. A lot of systems don't offer anything better, but if you do have an option to use an authenticator app or something that requires device enrollment, that would be better. One example article on the topic: https://techcrunch.com/2016/07/25/nist-declares-the-age-of-sms-based-2-factor-authentication-over/ – Todd Wilcox Dec 12 '17 at 16:44
  • 3
    If you think of a pattern, chances are it won't apply to your code, but if you think of a code, chances are the code has *some* kind of pattern. One time I had to generate a random wi-fi password for my mum; I kept generating them until I found one that I thought had no patterns she would comment on. Then she said "Did you choose that one because it has your initials in it?" – user253751 Dec 13 '17 at 05:53
  • 1
    @TobySmith I notice patterns in those much more frequently in the app ones than in the SMS ones! – Tim Dec 14 '17 at 03:11
  • I spent 10 minutes typing up an identical question before SO recommended yours. I'm a bit sad you beat me to it, but happy I'm not the only crazy one. :) – THE JOATMON Mar 04 '20 at 15:03
  • 1
    While looking this up myself, and stumbling upon this post, I also found a good write-up on it from Wired, including interviews with a Security Engineer and a Psychologist. https://www.wired.com/story/2fa-randomness/ I would've added this as a comment, but I don't have the reputation for that. – JediWombat Apr 28 '22 at 23:10

5 Answers5

128

I have noticed this too, and I think it is a result of the human brain's tendency to apply patterns to random noise. This seems to be more common when specifically trying to remember a string of numbers.

ScarySpider
  • 1,118
  • 1
  • 7
  • 7
  • 4
    I'd say, this is it. These [OTP](https://en.wikipedia.org/wiki/One-time_password#Methods_of_generating_the_OTP) are usually randomly generated codes that try to be as secure as possible, introducing patterns in their creation would surely be a security flaw. Also, notice how short these numbers are, this, combined with what @ScarySpider mentioned, makes them easy to be remembered. – r41n Dec 12 '17 at 15:54
  • 26
    And once you start noticing patterns, [confirmation bias](https://en.m.wikipedia.org/wiki/Confirmation_bias) takes over. – MooseBoys Dec 12 '17 at 20:10
59

Roughly 85% of six digit random numbers will have at least one repeating digit and 40% will have a repeating sequential digit next to each other. (I am happy to be corrected on my math.)

These keys are generated using the standard TOTP algorithm. The article summarizes this implementation, showing there isn't any effort to generate a memorable number:

According to RFC 6238, the reference implementation is as follows:

  • Generate a key, K, which is an arbitrary byte string, and share it securely with the client.
  • Agree upon a T0, the Unix time to start counting time steps from, and an interval, TI, which will be used to calculate the value of the counter C (defaults are the Unix epoch as T0 and 30 seconds as TI)
  • Agree upon a cryptographic hash method (default is SHA-1)
  • Agree upon a token length, N (default is 6)

Although RFC 6238 allows different parameters to be used, the Google implementation of the authenticator app does not support T0, TI values, hash methods and token lengths different from the default. It also expects the K secret key to be entered (or supplied in a QR code) in base-32 encoding according to RFC 3548.

Once the parameters are agreed upon, token generation is as follows:

  1. Calculate C as the number of times TI has elapsed after T0.
  2. Compute the HMAC hash H with C as the message and K as the key (the HMAC algorithm is defined in the previous section, but also most cryptographical libraries support it). K should be passed as it is, C should be passed as a raw 64-bit unsigned integer.
  3. Take the least 4 significant bits of H and use it as an offset, O.
  4. Take 4 bytes from H starting at O bytes MSB, discard the most significant bit and store the rest as an (unsigned) 32-bit integer, I.
  5. The token is the lowest N digits of I in base 10. If the result has fewer digits than N, pad it with zeroes from the left.

Both the server and the client compute the token, then the server checks if the token supplied by the client matches the locally generated token. Some servers allow codes that should have been generated before or after the current time in order to account for slight clock skews, network latency and user delays.

Michael
  • 707
  • 4
  • 4
  • 20
    There's no reason for an IM-delivered OTP to use this sort of algorithm. Most likely it's just six random digits from /dev/random. – Sneftel Dec 11 '17 at 22:37
  • 5
    @Sneftel This does have the advantage that the server doesn't have to store the OTP; it just calculates it when one gets entered. It also handles the fact that the code is only valid for a short window. If using a random number, you'd have to store what you generated and the expiry date for it. Obviously either one works equally well in the absence of a second factor, though. – Chris Hayes Dec 12 '17 at 00:02
  • @ChrisHayes Not having to store a six digit number for a couple of minutes, along with an expiry date, is an immaterial advantage. These are the sorts of things that need to be stored for the session anyway. – Sneftel Dec 12 '17 at 06:49
  • 7
    You can not possibly know if an unnamed vank uses TOTP or not, so I would hedge that statement. But +1 for doing the math. – Anders Dec 12 '17 at 08:45
  • -1 for unfounded claims as presented as facts – ilkkachu Dec 12 '17 at 13:02
  • 5
    @Sneftel While it's 100% true that "an IM-delivered OTP" doesn't *have* to use the above described algorithm, but *generally* when people refer to 2FA and OTP the above algorithm is implemented and used. It is a pretty safe assumtion to make. But, yes, if you must be pedantic about it then, true, it doesn't *have* to be referring to the RFC 6238. What that leaves us with, then, is why the digits seem to have a pattern and for that I agree, on both explanations (RFC 6239 or /dev/random), with the [currently accepted answer](https://security.stackexchange.com/a/175278/3992). – RobIII Dec 12 '17 at 13:44
  • 4
    IME texted codes are more likely to be generated with HOTP than TOTP, if anything. It is difficult to know how soon the code will arrive and have to be accepted; you also do not want two requests in the same time period generating the same code. – otus Dec 13 '17 at 06:26
  • @otus, good point. I read "IM on my phone" as a notification coming from the bank's mobile app, which would probably be TOTP, but upon re-reading I realize OP probably meant a text message. – Michael Dec 13 '17 at 15:11
  • 1
    **Man** I just did the math myself for a comment, scrolled down, and found that you beat me to the punch. From one random person on the internet to another: my math agrees with your math. That should be answer enough for anyone: If the internet says it is true, then it is true. – Conor Mancone Dec 13 '17 at 20:17
23

On my phone I had around 90 verification codes from various companies. 62 of these were 6 digits long. Here's the count of each digit:

Possibly a slight skew towards 1,8 and 9? Almost certainly just noise in the data (62 is a small sample).

What about double digits?

enter image description here The first graph is only the double digits on the 2-digit boundaries (i.e. AABBCC) - so we'd expect each pair to appear around 1.86 times across the 186 possible digit placements. The second is any placement (i.e. XXX99X counts as a double digit). We'd expect each pair around 3.1 times across the 310 placements.

There doesn't seem to be any obvious skew with lots more double digits than non double - double digits are shown in orange. In the latter data, we would expect around 31 double digits, and we get 27. That seems reasonable.

Of course, this doesn't rule out other "non random" patterns - but to be honest humans are likely to be searching for patterns - look at these numbers, all taken from my 2FA app: 365 595, 111 216, 566 272, 468 694, 191 574, 833 043.

Tim
  • 950
  • 1
  • 7
  • 16
14

I hope that this is just random chance in your case. If there is a pattern, it weakens the whole point of having a second code.

No, they are not intentionally supposed to be easy to remember and there is no generalized business case for it unless they had feedback that their users were having trouble typing in 6 numbers. Then someone might have done something silly, but I really hope not.

schroeder
  • 125,553
  • 55
  • 289
  • 326
14

It's also to do with the way humans tend to think of randomness. In true randomness, repeated digits and repeated patterns occur a lot more often than we expect they should. When humans are asked to create sequences of digits that "look" random, they tend to avoid repeating patterns or digits (as well as other quirks, like over-using "7", and under-using "0" and "2", etc). If you ask someone to choose a "random" number between 1 and 100 it'll very often contain a 7, and quite often be 37 (or 17). You can study lottery numbers people pick manually as (often) people are trying to pick something random-looking (on the false belief that random-looking numbers are more likely to win in a random draw).

If a human is trying to emulate a random coin toss, they will alternate between heads and tails a lot more than they will repeat the last result, making it possible to predict their next value with fairly good certainty (>50% chance their next value will be the opposite to their last).

A repeated digit or two-digit sequence would be quite common in a true random 6-digit number (eg ~41% of a consecutive repeated digit, ~85% of a repeated digit anywhere), and very uncommon in a "random" 6-digit number you ask a human to come up with.

thomasrutter
  • 1,608
  • 12
  • 17
  • 1
    Picking a truly random number is a legitimate strategy in a lottery as you're less likely to share the prize with lots of other people. Of course, as most lotterys have a "give me truly random numbers" option, trying to do it manually is a bit stupid – Richard Tingle Dec 11 '17 at 23:58
  • That is correct, and not to be confused with a "random-looking" (to a human) number which is a disadvantage in a lottery. When it comes to lottery entrants superstition plays a big role, people have "lucky" numbers, etc. – thomasrutter Dec 12 '17 at 00:29
  • @fjw OK but how do you know people pick random-looking numbers because they have a false belief that random-looking numbers are more likely to win as opposed to picking random-looking numbers based on the justified belief that random number maximises their gains (coupled with our inability to pick genuinely random numbers)? (+1 to the answer in any case) – Relaxed Dec 14 '17 at 17:09
  • The "default" is not to pick your own numbers and have them generated randomly. To specifically choose to pick your own numbers is always done under the belief that you can somehow do better than a random number generator, whether that is to maximise gains or probability of winning. If that belief is that you can get them more "random" or you can pick better "random" numbers than automatically generated numbers, that's false. Superstition plays a big role and also perhaps some paranoia: that the automatically generated numbers are somehow rigged against you. – thomasrutter Dec 14 '17 at 22:43
  • I'd always imagined that most people playing the lottery (since they're presumably unaware or unconcerned that they are throwing their money away in the long term) would play with a fixed set of their own personal favourite "lucky" numbers, which would likely skew winning numbers towards certain popular ones. Either way, I would say that choosing numbers that everyone else is less likely to pick could well be a better strategy than using purely random numbers every time, due to less sharing of any wins. – screwtop Jan 20 '22 at 01:04
  • ...which makes me wonder if there are studies of people's favoured numbers/combinations. Here's some good further reading: https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2012.00540.x – screwtop Jan 20 '22 at 01:10