117

Imagine a typical 4-digit PIN scheme containing the digits [0-9]. If I choose my PIN at random, I will get one out of 10 * 10 * 10 * 10 = 10,000 codes. Based on my own experience, more than half of the time a random sequence of four digits will contain some property or pattern that significantly lowers its entropy: single digit used in more than one position, ascending/descending pattern, etc. (Yes, yes, a 4-digit PIN only has something like 13 bits of entropy max to begin with, but some random codes are even more awful.)

If I were abide by a rule where I only use a PIN that has a unique digit in each position, I believe the number of codes available to me becomes 10 * 9 * 8 * 7 = 5,040 (somebody please correct me if I got that wrong). I have almost halved my key space, but I have also eliminated many of the lower-entropy codes from consideration.

At the end of the day, did I help or hurt myself by doing that?

EDIT: Wow, lots of great responses in here. As a point of clarification, I was originally thinking less in terms of an ATM/bank PIN (which likely has an aggressive lockout policy after a number of incorrect guesses) and more in terms of other "unsupervised" PIN-coded devices: programmable door locks, alarm system panels, garage door keypads, etc.

smitelli
  • 2,045
  • 3
  • 16
  • 19
  • 56
    You've hurt your password. The most secure one is one that is purely random. – Awn Apr 04 '17 at 06:11
  • 21
    A machine doesn't know 2232 is more likely because it's easier. So when you brute force, 2232 is just as hard to guess as 3569. Also, after three attempts the card is blocked. So even a pin with three identical digits is fine as it is just as unlikely as any other combo. – user3244085 Apr 04 '17 at 07:36
  • 35
    Any security in a 4 digit PIN is probably going to come from the issuer being willing to lock out the pin after a probably single digit number of attempts. Therefore my intent is to avoid the first couple things an attacker would try. Avoiding encoded data about self like birthdays is probably more important than patterns. – Weaver Apr 04 '17 at 07:45
  • The entropy of the password only becomes lower if you sort them by magnitude, ie. you assume the best worst case brute force, which tries 0000, 1111, 2222, etc and then 111 – HopefullyHelpful Apr 04 '17 at 07:58
  • 90
    @Eclipse not so. If it is known that 20% of the population use 1111 or 1234 you are best not being part of that 20%, even if your random number generator comes up 1234 by pure one in 10,000 chance. – nigel222 Apr 04 '17 at 10:04
  • 10
    Ruling out all numbers with repeated digits, all numbers with all four digits in an ascending or descending sequence, all numbers that look like years (19xx/20xx), and all numbers that look like dates (0101..0131, 0201..0229, etc) leaves 4,785 "valid" PINs, almost half the total keyspace. The only remaining problem is if they've used a significant date (birthday/anniversary) etc in the first nine days of January to September, which could be almost any number and thus can't be mechanically ruled out. – Random832 Apr 04 '17 at 15:01
  • Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/56640/discussion-on-question-by-smitelli-when-choosing-a-numeric-pin-does-it-help-or). – Rory Alsop Apr 06 '17 at 10:06
  • 4
    IIRC, there was a data leak a while ago that contained PINs, and [this heat map](https://i.stack.imgur.com/OmnDV.jpg) was made from the leaked data. This indicates the most popular pins. Upd: [The full analysis](http://www.datagenetics.com/blog/september32012/) – ikegami Apr 06 '17 at 18:40
  • 3
    Isn't 'numeric PIN' tautological? And, any 4 digit number could be reagarded as random - 1/10,000 - as in each occurs only once as itself. Numeric lock default is often 0000 from the manufacturer. – Tim Apr 07 '17 at 11:46
  • 2
    In general, *any* restriction on the choice of characters in a character-based key or passcode "weakens" security, at least in concept. Rules such as requiring a digit, both upper and lower case, and a special character in a password are intended to keep stupid people from being even stupider (but likely only force them to write the code on a piece of tape on their debit card). – Hot Licks Apr 08 '17 at 03:19
  • 1
    The heat map image linked in ikegami's reply above answers this question intuitively and without the need to explain complex math. – gth Apr 10 '17 at 03:37
  • Actually, reduced entropy doesn't have an effect when brute forcing if the attacker doesn't know you're excluding those PINs. – mzcoxfde Apr 11 '17 at 07:43
  • 1
    My pin all sevens but I won't tell you in what order. – Pieter B Apr 11 '17 at 14:11
  • Interesting fact: The German Enigma machine never encoded a letter to itself, meaning there was only 33554432 (2 to the 25th power) possible solutions, rather than 67108864 (2 to the 26th power). This made it loads more easy to crack. http://math.stackexchange.com/questions/1209481/how-does-the-enigma-machine-ensure-that-no-letter-is-substituted-for-itself – marcellothearcane Apr 12 '17 at 08:35

15 Answers15

151

The thing is, with a 4 digit pin, entropy isn't really important. What's important it the lockout and the psychology of the attacker.

The keyspace is so small that any automated attack (without lockout) would exhaust it almost instantly.

What you're worried about is an attacker guessing the pin before the account locks. So assuming a sane lockout (say 3-5 incorrect attempts), you want your PIN to be outside the 3-5 most likely to be chosen PINs.

Personally I'd avoid any 4 digit repeating sequence and anything starting 19XX which would be a year of birth.

Now smart alecs will say "ahh but if you do that the attackers will know not to try those", but that only applies if a) the majority of the user population follow that advice (hint, they probably won't) and b) the attackers know that the user population has followed that advice.

Some great analysis of this (link courtesy of @codesincahaos)

Edit 2 - For a far more mathematical take on this I'd recommend reading @diagprov's answer

Rory McCune
  • 61,541
  • 14
  • 140
  • 221
  • True as it is, it still is a Q&A site. OP asked a question in which nothing suggests OP has any influencer on the lookout time. Also the question uses some math. If the question was about outsmarting the other party, then we can bid. I would state any sequence on the numerical pad making a pattern, look `1254`, `7618` is much easier to match than `2776`. But how to evaluate the answers then? – techraf Apr 04 '17 at 09:33
  • 17
    I guess I was evaluating it based on OPs last sentence asking if it helped or hurt their security. To me what's important in PIN security is not choosing a commonly used PIN. If there's no lockout you're stuffed anyway, so the only factor in your control is choosing an uncommon one not a random one. – Rory McCune Apr 04 '17 at 09:35
  • 1
    Ok, does it hurt or help the security to exclude repeated digits if the PIN was 8-digit long? – techraf Apr 04 '17 at 11:17
  • 5
    Well with 8 digit PINs you're still at a stage where un-contrained brute-force is trivially easy, so you'd be assuming that lockout + not choosing common PINs would be the best defence. The answer to your question would depend on whether repeating digits were amongst the most common forms of 8 digit PIN, whether the attacker would be aware of the restriction before making an attack and whether the PIN is system or user generated. – Rory McCune Apr 04 '17 at 11:25
  • I re-read your answer a few times and I cannot see the answer to the question "*did I help or hurt myself by doing that?*" - that would be perfectly ok, but in your comment you actually stressed the fact that you are addressing this very question. – techraf Apr 04 '17 at 11:45
  • 5
    READ THAT LINK. Sorry for shouting. – dave Apr 04 '17 at 12:15
  • " but that only applies ..." more importantly; it still leaves him with 9890 combinations to try. – Taemyr Apr 04 '17 at 13:09
  • 1
    @RоryMcCune That DataGenetics article was a great read, thanks. – smitelli Apr 04 '17 at 15:15
  • a) and b) and unnecessary because the more important point is, even if attackers know not to try those, they'd still have no way to know how to guess your random pin in 3 attempts. – Goose Apr 04 '17 at 18:06
  • The part about smart alecs reminds me of the old outsmart... I'll pick 5000 because it's right in the middle, so either end they start it will take them a maximum amount of time to hit it on average. But what if they use a binary search? They will start at 5000 and guess it in one go. Ok, so that would rule out 5000, 7500, 2500, 3750, etc. So I'll pick 4999 or 5001 as that is the longest average path for both a binary search and a linear search. But wait, suppose they start at the middle and do a linear search from there... etc. – Michael Apr 06 '17 at 20:07
  • 1
    @Michael Binary search has no applicability here. No, really, it doesn't. Think it through before responding. Finding a random PIN is O(n). A binary search is a O(logn) operation that starts with a *known key* and searches an *ordered subset* of the possible keys. If it's not a subset, then you can use direct indexing, which is O(1). "5000, 7500, 2500, 3750, etc." -- a binary search will only look on one side of the initial try or the other, not both. That's the whole point of binary search, – Jim Balter Apr 07 '17 at 03:50
  • @JimBalter That was kind of my point... you can try to pick a PIN based on what scheme you think the attacker will use to try to guess PINs, but it falls in the old "outsmarted yourself" category. – Michael Apr 07 '17 at 15:52
  • 1
    ^ No, the point is that you don't understand what a binary search is or when it's applied. – Jim Balter Apr 07 '17 at 21:18
  • 8068: The #1 most popular PIN of datagenetics readers. – Jason C Apr 08 '17 at 05:57
  • 2
    @Michael I've got no idea about pin security, but binary searches really aren't applicable here. A binary search only ever makes sense if you get feedback on if a value is "bigger than" or "less than" what you've tried. I know of no pin system where the error message is "sorry, the password you tried was a larger number than the correct password" - the attacker only ever gets "is the right password" or "is not the right password" feedback. – daboross Apr 08 '17 at 21:50
84

I'm going to barge in and talk about entropy and probability for a little bit and hopefully this will help you understand.

Firstly what is probability? This is actually an open question amongst statisticians but here's the frequentialist definition: we say that if a fair coin is flipped, it has probability 0.5 of coming up heads. However, if you flip a coin you might observe that the first five results are all heads, which does not look right. So, the frequentialist says that if you were to flip the coin "enough" times, you would eventually find that one in two of the coin tosses are heads.

The key is that probability says nothing about what will actually happen. A high-entropy password could be guessed on the very first try by simple luck, regardless of possible outcomes and so on.

Now what is entropy? If you started saying "well it's the number of possible outcomes..." you might be right in a generating-some-random-data context, but this is the perfect example of where you really need to understand what is going on underneath.

Firstly, let's talk about self-information. This is a random variable (which means there are a number of possible outcomes) that varies over the probability of each outcome (and then we take -log2(P(X)) to encode it into "bits" of information). So we need to assign each outcome a probability.

As others have pointed out, some variations of PIN choice are more likely. All the same numbers (1111, 2222, 3333, ...), Birthdays (20XX, 19XX) and so on. You should assign higher probability to these numbers because simply put people are more likely to pick them and are certainly not going to pick a random sequence. How you assign probability to other numbers is entirely up to you and really depends on how much you know about the process of choosing a pin.

Now, entropy, or to keep @codesinchaos happy, Shannon entropy specifically, is the mean of the self information distribution. It's the "most likely" value of self-information given the probabilities of each choice. What does this mean? As the current top-voted answer says, it is a measure of the choice process and how good it is, not the pin itself.

What happens when you take out high probability choices like 1111, 2222, 3333? These outcomes give very low self information (-log(P(X)) is small for large probabilities, since we expect them to occur) and so removing them moves the distribution to the right, i.e., moves the location of the distribution towards the centre. This will increase its mean. So, removing choices most people would otherwise make with high probability actually increases entropy.

Let's look at entropy in a different way: if you were going to guess PINs, in what order would you try them (assuming no lockout)? You would begin with the most likely PINs for certain. What entropy is saying is that if you repeated this experiment enough times (i.e. tried to guess the PIN of a large number of cards whose PINs were chosen with the exact same logic) then a lower entropy choice would give you, the attacker, success more quickly.

Again, this remains a question of what might happen in the theoretical case of many cards, not what might happen because the attacker gets lucky.

Here is your executive summary:

  1. What entropy becomes is depends on how you assign probabilities to the outcome space.
  2. Without a doubt, if you leave humans to choose PINs, they will choose certain values with much higher probability than others.
  3. This means you can't assume the underlying distribution is uniform and say "entropy==number of outcomes".
  4. If you take out the highest probability poor-choice options, entropy goes up.
  5. Entropy, like probability of guessing correctly, says absolutely nothing about whether an attacker will get lucky and guess your PIN correctly. It simply says that in theory better entropy gives your attacker a harder time.

Now, to round out my answer, let us look at practicalities. If we are going to compare to passwords, or hash function output choices, or random data, PINs suck. If you give an attacker and defender free choice of PIN guess and no other information, the number of guesses to be right 50% of the time (birthday paradox) is ridiculously low. PINs would make lousy hash functions.

However, humans cannot memorise 128-bits of data very well, especially when drunk and trying to pay for a kebab using chip-and-pin. PINs are therefore a pragmatic compromise and with three guesses as a limit, aside from an attacker getting very lucky, you should be safe.

TL;DR Removing the choice of more likely PINs from your possible choices improves your chances when faced with an attacker that will not be guessing at random (i.e. most attackers).


Edit: I think this dicussion warrants some mathematics now. Here is what I am going to assume in my calculations:

  1. We are using 4-digit PINs
  2. The data from Raesene's link is correct, i.e. that:

     #1     1234    10.713%
     #2     1111    6.016%
     #3     0000    1.881%
     #4     1212    1.197%
     #5     7777    0.745%
     #6     1004    0.616%
     #7     2000    0.613%
     #8     4444    0.526%
     #9     2222    0.516%
     #10    6969    0.512%
     #11    9999    0.451%
     #12    3333    0.419%
     #13    5555    0.395%
     #14    6666    0.391%
     #15    1122    0.366%
     #16    1313    0.304%
     #17    8888    0.303%
     #18    4321    0.293%
     #19    2001    0.290%
     #20    1010    0.285%
    
  3. I am also going to assume that any PIN not mentioned in this list has an equal chance of being chosen from the remaining, "unallocated" probability (1-total probability consumed above). This is almost definitely incorrect, but we only have so much data.

To compute this, I used the following sage code:

def shannon_entropy(probabilities):
    contributions = [p * (-1*log(p,2)) for p in probabilities]
    return sum(contributions)

Computes the actual shannon entropy for a given set of probabilities.

import itertools
total_outcomes = 10.0^4
probability_random_outcome = 1 / total_outcomes
probability_random_outcome
maximum_entropy = -log(probability_random_outcome, 2)
maximum_entropy

maximum_entropy_probability_list = list(itertools.repeat(probability_random_outcome, total_outcomes))
maximum_entropy_calculated = shannon_entropy(maximum_entropy_probability_list)
print(maximum_entropy)
print(maximum_entropy_calculated)

Demonstrates my function accurately computes maximum entropy, by taking a list of 10^4 probabilities, each at 1/10^4.

Then

probability_list_one = [10.713/100, 6.016/100, 1.881/100, 1.197/100, 0.745/100, 0.616/100, 0.613/100, 0.526/100,0.516/100, 0.512/100, 0.451/100, 0.419/100, 0.395/100, 0.391/100, 0.366/100, 0.304/100, 0.303/100,0.293/100,0.290/100,0.285/100]

outcome_count_one = 10^4 - len(probability_list_one)
print("Outcome count 1:", outcome_count_one)
probability_consumed_one = sum(probability_list_one)
print("Probability consumed by list: ", probability_consumed_one)
probability_ro_one = (1-probability_consumed_one)/outcome_count_one
entropy_probability_list_one = probability_list_one + list(itertools.repeat(probability_ro_one, outcome_count_one))
entropy_one = shannon_entropy(entropy_probability_list_one)
entropy_one

Here, as I said above, I take those 20 probabilities and assume the rest of the probabilities are distributed evenly between the remaining outcomes, by extending the list with each probability set evenly. The computation is performed.

probability_list_two = [6.016/100, 1.881/100, 1.197/100, 0.745/100, 0.616/100, 0.613/100, 0.526/100,0.516/100, 0.512/100, 0.451/100, 0.419/100, 0.395/100, 0.391/100, 0.366/100, 0.304/100, 0.303/100,0.293/100,0.290/100,0.285/100]

outcome_count_two = 10^4 - len(probability_list_two)-1
print("Outcome count 2:", outcome_count_two)
probability_consumed_two = sum(probability_list_two)
print("Probability consumed by list: ", probability_consumed_two)
probability_ro_two = (1-probability_consumed_two)/outcome_count_two
entropy_probability_list_two = probability_list_two + (list(itertools.repeat(probability_ro_two, outcome_count_two)))
entropy_two = shannon_entropy(entropy_probability_list_two)
entropy_two

In this instance, I remove the most likely PIN, 1111 and recompute entropy.

From these results, you can see that randomly chosing a PIN has 13.2877 bits of entropy. Repeating this experiment with one PIN removed gives us 13.2876 bits

Choosing a PIN given those probabilities of choice for those 20 PINs and otherwise choosing randomly means your choice as 11.40 bits of entropy, out of a possible 13.2877 bits. From this base, blocking PIN 1111 and otherwise allowing the remaining 19 obvious PINs and all other PINs chosen with equal probability has entropy 12.33 bits, out of a possible 13.2876 bits.

I hope this explains why many of the answers are saying entropy is going down, rather than up. They're considering maximum possible entropy, rather than the average entropy (shannon entropy) of the system taking into account the possibility of choice. A better measure is the shannon entropy, since it takes into account the probability of each choice being made overall and so how an attacker will likely proceed in attacking.

As you can see, blocking that PIN 1111 significantly increases shannon entropy, at a slight cost to overall possible entropy. If you want to argue about entropy, basically, removing the PIN 1111 massively helps.

For reference that XKCD comic calculates entropy of poor passwords at about 28 bits and entropy of good ones higher, at 44 bits. Again it depends on what assumptions are being made as to the probabilities of certain choices but this should also show that PINs suck in terms of entropy and the N-tries limit for small N is the only sane way to proceed.

Public sage worksheet

diagprov
  • 2,084
  • 12
  • 12
  • 4
    I appreciate the explanation. +1 – jpmc26 Apr 04 '17 at 23:42
  • 2
    Note that Shannon entropy is not a great measure of password strength. With the numbers used above, Shannon entropy would model the password `1111` as guessable in 10 tries on average (since it assumes the attacker will try random passwords with the given probability distribution), but actually it will be guessed in 1 try on average since the attacker will just pick the most likely password first. The latter can be captured by the concept of [guessing entropy](http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-63-2.pdf#page=114). (The logic of the answer applies still.) – Tgr Apr 07 '17 at 22:38
  • 4
    Also, I don't think it makes sense to invoke the birthday paradox here. That is only relevant for collisions, not attempts to guess a fixed target. – Tgr Apr 07 '17 at 22:41
  • 1
    @Tgr But isn't diagprov talking about a collision when the users and attackers both randomly select PINs? – jpmc26 Apr 11 '17 at 06:19
  • @jpmc26 still don't see how it would make sense to model that as a collision. – Tgr Apr 11 '17 at 09:10
  • @Tgr Modeling it as a random choice by both defender and attacker is a best case scenario for the defender. diag's point is that even in this best case, the attacker doesn't have to work very hard to find your PIN now matter how good it is. They just need to guess a few times, something trivial to do with modern technology, and the odds of guessing right go up very quickly as the attacker continues to guess or if they guess against multiple defenders. When these "collide," the attacker wins. (The birthday paradox is really about how quickly the probability grows as you increase attempts.) – jpmc26 Apr 11 '17 at 09:48
  • 1
    That's just wrong though. If the defender picks a random number, or the attacker is guessing randomly, the expected number of guesses for an 50% chance is 5000, and the chance of success is linear with the number of guesses. (FWIW if one of the two uses an uniform random distribution, the choice of distribution/algorithm by the other makes no difference.) Multiple guesses by the attacker can in no way "collide" with each other. – Tgr Apr 11 '17 at 10:24
  • @Tgr to address your comments, what you've misunderstood is that entropy is not the strength of an individual password strength, but rather the method. The method in this case is "let many humans choose their own PIN" and stopping them making certain choices improves this method. Of course the most ideal method is random generation of all PINs. What this means is that the PIN 1111 has different self-information (which would often be called "entropy" even though it isn't) depending on the generation method. This is at least purely , mathematically speaking. – diagprov Apr 26 '17 at 14:07
  • Now my remarks on collisions are to compare PINs to Passwords, well, in a way what I wanted to say was that very simply the output space is too small to even really be considering entropy. Again we are talking about the method and in the best case all of the defenders are randomly generating their PINs. Then the attacker, also guessing randomly, will only need approximately 141 (according to wikipedia's formulae) guesses to correctly guess someone's PIN, 50% of the time this game is played. – diagprov Apr 26 '17 at 14:17
  • In short, I have explained the mathematics of entropy and that, given the generation method, removing obvious PINs like 1111 improves it based on what we expect people to do, but decreases it if we imagine people behave perfectly. But the real defence here lies in the 3 tries limit and luck that the attacker, on stealing your card, will not guess your particular PIN on his first attempt. – diagprov Apr 26 '17 at 14:22
  • @Tgr But the defender only selects *once*. The attacker selects as many times as they choose. – jpmc26 Jun 23 '17 at 22:44
30

This really depends on how the PIN is created:

  • If the PIN is generated, make sure the distribution is uniform and don't exclude any combinations. That will maximize the entropy.

  • If the PIN is chosen by a human operator, it makes perfect sense to exclude some combinations. I wouldn't go as far as rejecting half of the combinations, but if you do, you should also consider reject PINs starting with 0 1 and 2 (think birth years and dates) then PINs corresponding to physical key layouts like 2580 and 1379 and so on and so forth. Just make sure you stop before you end up allowing a single 8068 PIN which this study has found to be the least probable.

What you should do for human-chosen PINs is excluding the most common combinations: 1234 and 1111 alone account for almost 17% of all PINs in use, and 20 most popular PINs account for almost 27%. Those include each digit repeated 4 times and popular combinations like 1212 and 4321.

Edit: on a second thought, I think you should exclude most common combinations in any case. Even if your PIN is randomly generated, the attacker may not know that, in which case they will most probably try those combinations first.

Dmitry Grigoryev
  • 10,122
  • 1
  • 26
  • 56
  • 12
    Your edit is much better than your original comment. What matters is *the attacker's algorithm*. Since attackers must check the whole set of combinations, not just the subset that this particular person is choosing from, it's the entropy of the former, not the latter, that matters. – Jim Balter Apr 05 '17 at 06:56
27

Entropy is a property of the password generation method, not the password.

If you decide to eliminate repeated digits - this decision lowers the entropy compared to generating a random sequence.

In fact, anything you come up with will have lower entropy than generating a random sequence.


And if you believe a randomly-generated password 1111 has a low entropy and is thus easier to brute-force, just go to any gambling place and bet on 1 four times in a row - it should be a sure win.

techraf
  • 9,149
  • 11
  • 44
  • 62
  • 60
    While the sequence "1111" is just as likely as any other, I would propose that it is a sequence more likely to be guessed by attackers, and therefore you'd be more secure if you avoided it. – Tom Bowen Apr 04 '17 at 07:40
  • Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/56589/discussion-on-answer-by-techraf-when-choosing-a-numeric-pin-does-it-help-or-hur). Any further comments should be made there - any made here will be deleted. – Rory Alsop Apr 05 '17 at 11:11
  • 4
    I would also note - observation is a factor. It's much easier to tell that you press the same key 4 times, and which key it is than if you're pressing several. – Sobrique Apr 06 '17 at 13:49
  • 1
    As you say, entropy is a property of the password generation method, and when the most common method for generating PINs is by human choice, it is worthwhile to avoid the most common passwords. If I write a program for generating a random password of random characters of a random length, I have really good entropy in theory, but if that program spits out "password" by chance, it is still a good idea to discard that result since the attacker won't know or care that I generated it randomly – Kevin Apr 07 '17 at 15:32
  • 3
    So, if you wanted an 4 digit pin number for your car, and the random pin generator chose "1111", you would use that? If we're talking about brute forcing where pins are guessed at random, you're right that it wouldn't make any difference, but in the real world using "1111" would be a lot like always leaving your car unlocked. – Jojodmo Apr 08 '17 at 06:05
  • @Jojodmo Please quote the relevant sentence from the answer that made you think I would do so. – techraf Apr 08 '17 at 06:06
  • "_Entropy is a property of the password generation method, not the password._" Entropy is a property of the attack surface, not the password generation method. This conceptual mistake leads to a very poorly reasoned defense of including "1111" when that's literally the very first PIN most attackers will try. See [@Tgr's comment](https://security.stackexchange.com/questions/155606/when-choosing-a-numeric-pin-does-it-help-or-hurt-to-make-each-digit-unique#comment296396_155639) for a better characterization of the entropy. – Nat Apr 09 '17 at 16:18
17

Restricting your available pool of numbers reduces the number of possible solutions, making it less secure.

Repeating digits is a common human weakness when choosing pin codes, which means it will be tried first by attackers. Thus, ruling out repeated numbers increases security.

As is often the case, the decision has both upsides and downsides depending on the specific attack you're defending against. You probably shouldn't over-think it, and consider wider-perspective changes (like not using a 4-digit pin, or adding a second factor, or having lockouts on incorrect tries) if you want to increase the security of the system.

Xiong Chiamiov
  • 9,402
  • 2
  • 35
  • 78
  • I think the second point is a bigger concern. But don't avoid repeated digits. Avoid "obvious" patterns: 1234, 4321. Of course it I had Bruce Schneier's credit card I might try any random combination which wasn't an obvious pattern instead. If you are less well known then I think you are better off avoiding "easy" passwords. – dave Apr 04 '17 at 12:13
  • 3
    @dave That way lies [paradox](https://en.wikipedia.org/wiki/Interesting_number_paradox). . – David Richerby Apr 04 '17 at 13:18
  • 3
    @dave So, if the attacker knows Bruce Schneier, then his account is *safer* with a common, obvious PIN. Interesting. – jpaugh Apr 04 '17 at 21:50
  • @DavidRicherby True! I guess the idea of "uninteresting" integers is only compelling when you think of the integers as only partially ordered set (with none of the "uninteresting" integers having an ordering defined). – jpaugh Apr 04 '17 at 21:54
  • 6
    "Restricting your available pool of numbers reduces the number of possible solutions, making it less secure." -- This is quite mistaken, because the pool has not been reduced, only what the OP chooses to take from the pool. This difference is essential because *the attacker must address the whole pool*, and doesn't know that the OP restricted their choice unless the attacker is reading this page, knows who the OP is, and tailors their attack accordingly. – Jim Balter Apr 05 '17 at 06:51
10

It looks to me that most of the other answers are focussing on the wrong type of attack.

Since we are dealing with a very specific scenario (manual PIN input) we can optimize the PIN generation for the possible attack scenarios.

Dictionary Attack

From what I can gather from the question, we are talking about manual PIN input, so the attacker has to type out each PIN they try. So a brute force attack might take quite a while (say, you need two seconds for each PIN you try, it will take almost three hours on average). This is possible, but not the smartest approach.

So when a brute force attack is unfeasible, you can instead try a dictionary attack. Here you try the most common PINs first. If this fails you can always resort to a brute force attack later. With dictionary attacks, entropy does not matter that much anymore. Here it matters whether the password is in the dictionary or not. Since the attacker most likely does not have a real dictionary of common PINs, they will have to come up with their dictionary on the fly. This would mean, the dictionary will probably be rather short and pattern-orientated. Possible PINs in the dictionary would be:

  • Consecutive PINs (e.g. 0123 or 1234)
  • PINs with four times the same digit (e.g. 2222)
  • maybe also PINs with only three times the same digit

By eliminating these few passwords you don't lower your keyspace size by much but you can easily defend against dictionary attacks. Similar strategies are used by websites that don't allow you to use common or easy to guess passwords (e.g. using the username as the password)

Brute Force Attack

Next we can try to optimize against possible brute force attacks. This might help a lot less for a higher cost, so this might not be worthwhile. There are two main strategies how an human attacker can perform brute force attacks: either just enter random PINs or start with 0000 and count up (or 9999 and count down). So PINs like 0001 or 9998 might be a bad choice since someone performing a brute force attack might find them rather quickly. So maybe exclude PINs starting with 0 or 9.


Following these rules you should not loose too many possible passwords, but you might be able to strengthen your PIN against the most common attack strategies for this specific scenario.

Dakkaron
  • 220
  • 1
  • 8
  • "So maybe exclude PINs starting with 0 or 9." I have to disagree. Excluding these PINs would only make the brute force attack take a little more longer, but the attacker will eventually pass through the needed PIN number, sooner or later. Excluding them will only reduce entropy. – mzcoxfde Apr 11 '17 at 07:36
  • 1
    At first, given enough performance, each key is bruteforceable. The whole point in using longer keys is to slow down the attacker long enough to make an attack unfeasable because it takes so long. Same goes here. If it takes your brute force attacker at least 1000 tries (at 2 seconds a try that is about half an hour) that is a lot better than e.g. 100 tries. Also, if there is a limit on how fast keys are entered, this can be improved drastically, e.g. with 10 seconds wait between each try you are at a minimum of 2.7 hours. That is a long time to try to crack some lock without getting caught. – Dakkaron Apr 11 '17 at 13:49
6

Don't have enough to comment but have a recommendation. PIN that are typed using a single button (or a simple pattern of buttons) are easier observed by a shoulder surfer. In elementary school the teacher thought they did me a favor by making my password 4321 and some jerk watched my finger move in a straight line and told everyone my password.

I advise making a list of weak PINs that are susceptible to this, and then subtracting them from fully randomly generated PINs.

northerner
  • 283
  • 1
  • 9
  • 3
    This! Depending on what a PIN is used for: which keyboard is used and who can observe, a single digit PIN can potentially be easily guessed. If you realize all 4 digits were the same and it was in the upper corner, then it's just a matter of trying 1111, 2222, 4444 and 5555. Seemingly random finger movement make it much harder to guess. – Matthieu M. Apr 04 '17 at 12:00
  • @MatthieuM. OTOH, if they are all across the keypad, did you pres 1-1-6-8, 1-6-6-8, 1-6-8-8, 1-1-8-8, etc. I think that is much harder for some 'shoulder surfer' to discern than 4-3-6-9. – Shane Apr 04 '17 at 19:27
5

It depends on the implementation. Eliminating consecutive numbers will reduce the keyspace by .1% but has some benefits to physical security that may make it worth the tradeoff.

A lot of good clever answers here, main point being that instead of making it more secure, you're making the keyspace smaller (however negibly, 10 out of 10.000).

The top answers however fail to touch on the physical aspect of entering a pin. Visual and thermovisual extraction are a real danger these days. In other words, bad guys shoulder-surfing your pincode either with their eyes, a telescope, a skimming camera on the ATM or even thermal imaging cameras.

That last one is more recent, and especially nasty as a skimmer can walk up to a pin pad and look at the heat signature, even if you covered the pad well.

Having a consecutive pin will hurt security in this area; it reduces the complexity of the physical location of the numbers by a horrible amount. Even if you covered your hand, chances are the attacker will guess the button you pressed four times before a lockout happens. On a phone, if there is a big grease spot on the zero, that's the one i'll try first.

J.A.K.
  • 4,783
  • 13
  • 30
  • 4
    "A lot of good answers here, main point being that instead of making it more secure, you're making the keyspace smaller (however negibly, 10 out of 10.000)." -- no, those are bad answers, because restricting the set of keys you choose from does not reduce the keyset, which is the set you *could* choose from and that any attacker must account for, unless you told them that you're omitting certain combinations and they tailor their attack just for you. – Jim Balter Apr 05 '17 at 07:03
4

On my phone, the PIN deliberately uses one of the numbers twice in a row, in order to make it harder to guess because:

  • The amount of "grease spots" does not match the number of digits
  • A "shoulder surfer" will have a (little) harder time to distinguish double tapping from single tapping

Addendum: The phone in question allows for a custom lenght of the PIN, thus an attacker (not observing pin entry) does not know the number of digits in use.

Marcel
  • 3,536
  • 1
  • 19
  • 37
  • 3
    If the grease spots on your phone tell me that your n-digit pin uses n different digits, there are n! possible pins. If the grease spots tell me that it uses (n-1) digits, and I know that your phone makes you double tap one of the digits, there are (n-1)(n-1)! possibilities, _which is a smaller number_. Epic fail. – David Richerby Apr 04 '17 at 13:26
  • I would also hazard a guess that, in some cases, it would be easy to see which digit was the repeated one by how different the smudge looks. I've certainly seen door locks with metal buttons with one digit worn way down compared to the others. – smitelli Apr 04 '17 at 15:01
  • 1
    If I see a big grease spot on the zero of a phone, I'll bet money on what their pin is. – J.A.K. Apr 04 '17 at 17:02
  • @DavidRicherby Except the grease spots *don't* tell you that it uses (n-1) digits. Looking at grease spots doesn't tell you if the password was 1-1 or 1-1-1-1-1-1. You also don't know if there was a double tap or not. Allowing repeats give you a space of n!^n! Any number that you pressed can be in the final sequence any number of times. – Shane Apr 04 '17 at 19:37
  • 1
    @Shane If it's a PIN, it's of fixed length, which I can find out by guessing digits. And we should always assume that the attacker knows any system-enforced rules such as "The PIN must consist of three digits with one of them repeated." Also, I'm not sure where you get n!^n! from. If the password has length $k$ and there are $n$ options for each digit, there are at most $n^k$ different possibilities. If there is no bound on the possible length, there are infinitely many possible passwords. – David Richerby Apr 04 '17 at 20:18
  • @DavidRicherby You are right, n!^n! was a dumb thing to say. The point is, if someone is secretly watching you press the buttons, they know what buttons you pressed and in what order. Without repeating keys they *already know your password*. If you used repeat keys, they don't. If I know they pressed 1-4-7, I don't know if their password is 1147, 1447, 144447 or 1477. If I know they pressed 1-4-6-9, I know their password is 1469. That is the difference between your card being locked and someone stealing your money. – Shane Apr 04 '17 at 21:06
  • 1
    @DavidRicherby A 4-digit pin with the digits [1,2,3,4] has 24 possibilities. A 4-digit pin with the digits [1,2,3] has 36 possibilities. – Joe Frambach Apr 04 '17 at 22:06
  • 1
    @Shane PINs have fixed length. If it's a four-digit PIN, it's 1147, 1447 or 1477. Oh noes, on average it takes two guesses instead of one. – David Richerby Apr 05 '17 at 00:56
  • 3
    @JoeFrambach No, a 4-digit PIN with digits 1, 2, 3 has 18 possibilities: 1123, 1132, 2113, 3112, 2311, 3211, 2213, 2231, 1223, 3221, 1322, 3122, 3312, 3321, 1332, 2331, 1233, 2133. – David Richerby Apr 05 '17 at 00:57
  • @DavidRicherby You are missing 1213, 1312, 2131, 2132, 2123, 3123, 3213, for starters – Joe Frambach Apr 05 '17 at 02:26
  • As the OP: Thanks for this extensive debate, seems a very controversial topic. The device in mind when writing the the answer allows for custom lenghts of the pin, mine is actually longer than 4 digits. – Marcel Apr 05 '17 at 06:01
  • @JoeFrambach The answer is talking about PINs where the repeated digit must occur twice in a row, thouhg I didn't mention that in all of my comments. – David Richerby Apr 05 '17 at 07:43
  • 1
    This answer doesn't appear to attempt to actually *answer* the OP's question. It just looks like good advice. – I say Reinstate Monica Apr 05 '17 at 14:53
  • @ Those extra guesses are the difference between your card being locked and your bank account being empty. – Shane Apr 05 '17 at 16:36
4

You should make a specific list of “weak keys” in advance, that are what someone would try guessing. This includes important dates, addresses, etc. and may include 1111 if people actually would try that when guessing.

Then make a random draw, and filter against the (short) list. If the list is not short but systematic, (e.g. no repeated digits, no legal dates) then you wind up with too few possibilities which starts making it easier to guess again.

JDługosz
  • 1,139
  • 2
  • 7
  • 12
4

In 2012, a researcher compiled a list of the most popular PIN codes from a number of data breaches. What he found was that the most popular PINs were either:

  1. Sequences, such as 1234 or 7777
  2. Dates, like 2001
  3. Pop culture references, such as 0007 (from James Bond's code number 007)
  4. Easy to type, such as 2580 (numbers which lie in a straight line on many keypads)

Here is the top 20 (which does not include all examples given above): Top 20 most popular PIN codes. 1234, 1111 and 0000 comprise the top three.

So, sequences are bad. So are PINs that are easily memorable without a mnemonic. However, pins which have the same digit more than once are not necessarily bad, too.

Case in point:

At the other end of the scale, the least frequently used number I found in my dataset was 8068. Out of all the combinations of numbers this appeared to be the least interesting. It's not a date in history, it's not a pattern, it's not a birthday, it's not easy to type. It's the perfect pin … or it would have been until now.

So now you know.

user2428118
  • 2,788
  • 16
  • 23
1

There may be some attacks that vary in effectiveness, depending on the number of repeated digits.

For example, lets say someone applies a light dust to the keypad of your teller machine. You put your card in, cover your hands as you type in your pin, check your balance, then wonder off. As you go, someone picks your pocket, gets your card.

They now have your card, and can see which buttons have their dusting disturbed - they know the digits, but not the order.

If they see you pressed the digits 2, 3, 6, and 8, then your pin could be one of the following:

2368, 2386,   2638, 2683,   2836, 2863, 
3268, 3286,   3628, 3682,   3826, 3862,
6238, 6283,   6328, 6382,   6823, 6832,
8236, 8263,   8326, 8362,   8623, 8632

24 possibilities. With 3 guesses, they have a 1/8 chance of guessing right.

Here are the possibilities with 4 digits, one of which is repeated: 2, 3 and 6:

2236, 2263, 2326, 2336, 2362, 2363, 
2366, 2623, 2632, 2633, 2636, 2663, 
3226, 3236, 3262, 3263, 3266, 3326, 
3362, 3622, 3623, 3626, 3632, 3662, 
6223, 6232, 6233, 6236, 6263, 6322, 
6323, 6326, 6332, 6362, 6623, 6632

There are 36 of these. Odds of guessing this in 3 attempts is 1/12. Better odds!

Lets try this again, this time with only two digits:

2223, 2232, 2233, 2322, 2323, 2332, 2333, 
3222, 3223, 3232, 3233, 3322, 3323, 3332

14 combinations, over 1/5 chance of guessing with 3 tries.

Obviously, with only one digit, there is only one solution, and it can be guessed straight away.

Of course, if the digits in your pin are 1, 6 and 9, I'm going to guess that you were born in 1961, 1966, 1969, or 1996 - if I see you walk off, I should be able to guess if you're 21 or 48ish, which means 3 guesses mis probably all I need.

AMADANON Inc.
  • 1,501
  • 9
  • 9
0

You are right that rules to remove patterns would hurt your key space, but it doesn't lower anythings entropy as the entropy is from your hardware, your TRNG machine is allowed to output a repeating digit now and again.

On the math if you don't want to see a repeating digit like 9 in '0919' then your math is right, but if you mean repeating digit like 9 in '0991' then your are left with 7190 out of 10000.

But depending on what you are thinking with "four digits containing some property or pattern" you whittle it down further, remove any patterns like '12', 321' or '34', remove any non primes, then you only have a handful (~300 out of 10000) of uninteresting numbers.

daniel
  • 774
  • 3
  • 12
  • Why remove non primes? I'd think primes are more mathematically interesting. – timuzhti Apr 05 '17 at 03:44
  • 1
    I was going to remove anything divisible by 2, then anything divisible by 3, as they are easier to remember, and just took it all the way. But you are right some people might memorize primes under 10000 so they should be removed from the pool too, this leaves us with numbers that are non prime and are also are not divisible by other numbers :D – daniel Apr 05 '17 at 07:02
-3

Avoid using the same digit consecutive. The delay between the same digit will be shorter then the delay between different digits (since your finger is already on the button), giving important information away.

EL Dendo
  • 115
  • 2
  • 1
    Please state the reason for downvoting my answer... – EL Dendo Apr 04 '17 at 14:36
  • 4
    I didn't downvote, but this seems to assume a particular threat model that isn't brought up a concern. That is, it assumes the attacker has some mechanism of timing your key pressing patterns without actually observing the device. I'm vaguely aware that technology exists to make observations like this when typing in a keyboard, but it's not clear to me that the PIN would go in a keyboard to begin with. – jpmc26 Apr 04 '17 at 23:54
  • 3
    @jpmc26 It certainly has been brought up; see, e.g., the comment by David Wallace under the OP's question. As for your required assumption, watching someone's fingers or even just their hand from afar while the view of the actual keypresses is blocked qualifies. As for "keyboard", the OP wrote in their edit "programmable door locks, alarm system panels, garage door keypads, etc." – Jim Balter Apr 05 '17 at 07:18
  • in fact I was talking out of experience: I was typing in a code while a collegae couldn't see my hands but could hear me tapping a keyboard. Afterwards he told me which consecutive digits where the same. – EL Dendo Apr 06 '17 at 07:44
-4

4 digit pins are easily cracked in seconds with the right equipment. Why use such a laughably short pin when exploits like Reaver have proven that 4 digits is not enough (assuming your pin can be brute forced)

And if it can't be brute forced, what are you worried about? The odd's of someone guessing it are still 1 in 10,000, regardless of whether the pin is 1743 or 1111. You're taking a risk always with a shorter pin, and it would be ill advised to not accept repeated sequences as you're drastically lowering the entropy.

If your mind is set on this though, you could just make sure your generator keeps producing values until it has a unique digit in each position. This method would seem most logical for what you're trying to accomplish, and won't reduce the entropy if you still allow 1111, for example, to be generated although not used (repeatedly generating numbers until it finds one with unique digit in each position, and throwing away ones which are not)

Dylan
  • 115
  • 1
  • 8
  • 3
    "_won't reduce the entropy if you still allow 1111, for example, to be generated although not used_". Completely false. It doesn't matter whether you never consider PINs with repeated digits, or reject them if they are generated: the net effect is a "PIN space" of half the potential size, so reduced entropy. – TripeHound Apr 04 '17 at 07:37
  • 3
    lol no, if you choose 1234 for you PIN there's a much higher chance that an attacker will guess it in the first couple of shots, given that it's the most common PIN. Remember attackers are human. – Rory McCune Apr 04 '17 at 08:19
  • 1
    @TripeHound But that's assuming the adversary knows repeated sequences are not allowed... And technically the entropy would remain the same. You're wrong. – Dylan Apr 04 '17 at 08:38
  • 2
    @Dylan No, you're wrong. The scheme you've described is just rejection sampling to produce the uniform distribution over some subset of the 10^4 possible pins. The entropy of a distribution depends only on the distribution, not on the method you use to sample from it. (For example, the entropy of the distribution of numbers generated by rolling a 4-sided die is exactly the same as the entropy of the distribution generated by rolling a 6-sided die and re-rolling 5s and 6s, because it's the same distribution. This is essentially what you're doing here, only with a 10,000-sided die.) – David Richerby Apr 04 '17 at 13:31
  • Dylan is right (not about 1234, but about the subsetting) and TripeHound is completely wrong. The "PIN space" is the set of possible pins, the one an attacker must account for, not the subset that the OP selects from, which is unknown to the attacker (unless they are reading this page, know the OP, and tailor their attack accordinging). David Richerby makes the same mistake, mistaking a filtered subset of the sample space with the sample space, which remains the same. – Jim Balter Apr 05 '17 at 07:07
  • @JimBalter While an attacker won't _always_ know that the attack space has been restricted, _sometimes_ they will (especially for any moderately popular site). One of the underlying principals of security is not to rely on obscurity, so you've got to plan for the worst and assume they _do_ know. – TripeHound Apr 06 '17 at 13:42