My colleague Mike Sierchio wrote a cool post on password strength, and the concept of entropy. As he points out, entropy isn't always entropy. That confusion is apparently not uncommon, as it's been asked about on IT Security Stack Exchange as well. So what's really going on?
Let's step back for a sec and fill in some context. What are we trying to do? We'd like some way to measure how hard it is to guess our passwords, a number that serves as a heuristic standard of password strength. But there are two fundamentally different things we might want to measure:
How hard would it be for someone to guess your password with essentially no knowledge of how you created your password?
How hard would it be for someone to guess your password if they knew the process used to generate it? This is of course assuming that there is a process, for example some script that does some Math.rand
-ing and produces a password string.
The term "entropy" has been used to refer to both kinds of calculations, but they're clearly entirely different things: the former essentially takes a string as input, the latter takes a random process as input. Hence, "entropy is not entropy."
Alright, well if entropy isn't entropy, let's see what entropies are. We'll look at the standard mathematical formulation of the random-process-entropy which comes from information theory. And we'll look at the function used to calculate particular-string-entropy in some password strength tester (e.g. http://rumkin.com/tools/password/passchk.php). And that's all we're going to do, we'll look at how the calculations are done, without dwelling too much on the differences between the two approaches or what their use cases are.