Fifty Thoughts to the Dollar: February 2013

Let us construct a very simple economy with only two individuals. Let's call this world Simpletonia. These individuals, or Simpletons, will be known as Gordon and Peter.

In this world, there is only one type of Resource, which is non-perishable and has zero storage costs. It is necessary to consume 1 unit of Resource a day in order to maintain living standards. Resource is located in one of two types of Deposits on this world - one low-yield and one high-yield - from which the Resource must be gathered before it can be consumed.

The first type of Deposit yields 2 Resources at the end of a day of continuous gathering. The second Deposit yields 1000 Resources as a lump sum at the end of a year of continuous gathering. Needless to say, each Simpleton can only gather from one deposit at a time.

It is clear that the second Deposit offers a higher yield than the first (1000 Resource / 365 days = 2.7 Resource a day, which is higher than 2 Resource a day). Thus, each Resource-maximizing Simpleton should gather from the high-yield Deposit as soon as possible. In a pre-debt economy, this cannot happen until each Simpleton has stockpiled enough Resource by gathering from the low-yield Deposit for a whole year. In a debt economy, Peter can immediately start gathering from the high-yield Deposit while Gordon gathers from the low-yield Deposit. This happens because instead of saving his extra unconsumed Resource in storage (where it will be a zero-yield asset), Gordon can invest in Peter by lending him 1 Resource a day.

Would you rather be indebted or debt-free? The debtor (in this example, Peter) isn't necessarily worse off than the creditor (Gordon). Peter can probably default with little or no immediate consequence since he has no tangible assets for Gordon to seize. However, this will be sub-optimal for Peter in the long term, as Gordon will be very unlikely to lend to Peter again, and thus they will revert to the pre-debt economy in which access to the high-yield Deposit is delayed by a year.

In fact, this is very similar to the famous Iterated Prisoner's Dilemma from game theory. Just as in that game, tit-for-tat is the close-to-optimal strategy (if you cooperate -> I will cooperate in the next round, if you don't cooperate -> I will not cooperate in the next round). Thus, reputation is very important in a debt economy, which is why individuals have FICO credit scores, governments and corporations have bond credit ratings, and Argentina hasn't had access to international credit markets since they defaulted in 2002.

This is a highly simplified model, but we can see that the debt economy will do vastly better than the pre-debt economy: they have access to the high-yield Deposit, and thus more Resource, an entire year earlier. A few other things are immediately evident: debt as a whole nets to zero. How much everyone owes cannot be more than how much everyone is owed. One Simpleton's debt is another's asset. Gordon's surplus is Peter's deficit.

Even though debt always nets to zero, economic wealth isn't a zero-sum game. It is clear that Gordon and Peter will both be richer from this arrangement. Debt, in and of itself, isn't bad, because there necessarily always is someone in debt.

Finally, why is this relevant? If Gordon's surplus is Peter's deficit and we rename Gordon "Government" and Peter "Private Sector" and , then Government surplus = Private Sector deficit. Conversely, Government deficit = Private Sector surplus. Thus, if the private sector as a whole wants to pay down its debts through generating surpluses (as it has been trying to since the 2008 financial crisis), the only way for this to happen is for the government to incur deficits. In fact, some simpletons even believe that this is justification for the government to actively pursue a policy of large deficits in order to aid in private sector deleveraging.

With the recent news of Twitter losing up to a quarter of a million passwords and The New York Times' computer systems being compromised now sounds like a good time to discuss some computer security and why you might not have to panic about all those accounts for which you've used the same password (thanks to bcrypt!). This post focuses on what happens once someone steals your password file and not on how to protect that file, which is a whole different story.

Firstly, what does it mean when someone breaks into a computer system and steals your password? In the Twitter case, it means that someone was somehow able to grab a salted and hashed password. I'll explain what that means in a bit, but what the attacker gets is a list of passwords that look roughly like this:

$2a$10$vI8aWBnW3fID.ZQ4/zo1G.q1lRps.9cGLcZEiGDMVr5yUP1KUOYTa

Although you would never know it from looking at it, this is what your password actually looks like in a website's databases. Now, that looks nothing like your password that you type into login pages, and that is because they are salted and hashed. Hashing is a fancy way of saying that a password has been transformed in such a way that it is basically impossible to know the original text from the ciphertext (result of the transformation). For instance, using the md5 hashing algorithm, we can turn the input 'coolcat' into "f7f70fefd4bed17191df6ec8bc24c63d" (try it out with this online md5 generator). Notice that the hash for 'coolcat2' is "0054cc598c64e0780ddcc5ed7798c1c6", which is radically different, even though they're off by just one character. This is what hash algorithms are like: give me something and I'll give you random gibberish. Note: I use hashing function and hashing algorithm interchangeably.

Cryptographic hashing functions, which are just hashing functions that are used to deal with stuff that needs to stay secret need to satisfy some basic rules (ripped from Wikipedia):

it is easy to compute the hash value for any given message
it is infeasible to generate a message that has a given hash
it is infeasible to modify a message without changing the hash
it is infeasible to find two different messages with the same hash

When taken in the context of passwords, these four rules make a lot of sense. The first means that it should be easy to take a password like 'coolcat' and get the hashed result, because that's how your password is checked. When you type 'coolcat' into that password box to log in to Twitter a hashing algorithm takes it and makes it into the hashed result, and if that result matches what they've stored when you signed up, then they know the password you typed is correct, because of rules #3 and #4. These two rules basically mean that you can't come up with a different password that somehow also hashes to the same thing as your password, because that would be whats called a collision (if this did not hold, it could mean that 'coolcat' and 'dumbcat' could both be used to grant access to your account, allowing random racist tweets!). Number 2 is a little more subtle, but it means that given a hash you can't go from that hash to the original password (it is nearly impossible to find the inverse of the hash function).

The tl;dr is that when you hash a password, it becomes impossible to know what the original password was...unless. That unless is unless you try hashing that password. Going back to our Twitter example, this means that the hashed passwords the attackers got are useless until they crack them.

To figure out the original passwords, what the attackers now need to do is to try different combinations of letters, numbers and symbols until they find a hash that matches one they stole - when that happens, they know your password. Do this enough and eventually you'll have all 250k passwords, but how long does that take? Given md5, assuming that all passwords are alphanumeric and 8 characters in length (2,821,109,907,456 - I had to write it out, for effect) then it takes...5.64 seconds. This is based on slightly out of date estimates by Coda Hale on his blog (whose post was an inspiration for this one) that for $300/hr you can rent a CPU to do 500 billion passwords per second.

So what's the point of hashing (and this article) if it only takes 5.64 seconds to crack 2.8 billion potential passwords? I know I've been rambling a bit, but stick with me because shit is going to get real.

Passwords were originally hashed using algorithms like md5, or SHA-256, and that was fine, because computers were really slow, so it would take a thousand years to crack any meaningful number of passwords. As we've been able to cram more power onto tiny silicon chips, we've also made our hashing algorithms obsolete. This is where bcrypt comes in.

Whereas md5 or other cryptographic hash functions are designed to consume as much data as quickly as possible (md5 is often used to verify quickly the integrity of large files, such as videos), brcrypt was designed to do this very slowly! It was designed to be intentionally slow and intentionally difficult to parallelize to compensate for ever increasing hardware speeds, and was designed to protect your password even when the hashed password was stolen.

To learn more about this, lets look at the hashed example I provided earlier, which is actually what a bcrypt hashed password looks like:

$2a$10$vI8aWBnW3fID.ZQ4/zo1G.q1lRps.9cGLcZEiGDMVr5yUP1KUOYTa

Here's an explanation of the password from Stack Overflow:

2a identifies the bcrypt algorithm version that was used.
10 is the cost factor; 2¹⁰ iterations of the key derivation function are used (which is not enough, by the way. I'd recommend a cost of 12 or more.)
vI8aWBnW3fID.ZQ4/zo1G.q1lRps.9cGLcZEiGDMVr5yUP1KUOYTa is the salt and the cipher text, concatenated and encoded in a modified Base-64. The first 22 characters decode to a 16-byte value for the salt. The remaining characters are cipher text to be compared for authentication.
$ are used as delimiters for the header section of the hash.

Lets forget about the version and the salt for now, though I've been putting off the salt for a while. The real thing we need to focus on here is the cost factor. bcrypt takes two inputs to generate a hashed password: the password itself, and a cost factor. The cost factor, as stated above, determines how expensive it is to compute the bcrypt hash. By adding one to the cost factor, you increase the cost of bcrypt, by a factor of two. This is huge, because by adding something like 4, you've now made it cost 16 times more to calculate a password. Of course, if you've looking to log in to a website where you know a password, waiting 1/2 a second for your password to be hashed is ok, but if you're computing trillions of passwords, getting all trillion at 2 passwords per second is...impossible.

So bcrypt is intentionally slow, but there's other really cool stuff about it. Even though its slow, it only uses operations (xor and other bit operations) that computers are already really good at. This makes it really difficult to design a specific circuit that is even better at cracking bcrypt passwords than modern cpus (this attack has been done for other hashing algorithms, notably DES which used bitslicing, a relatively expensive operation for cpus). Another really interesting thing about is that it's very hard to parallelize. Every step of the hashing involves the inputs to be hashed by the other inputs, so each input to the next step is dependent on the previous step. In simpler terms, its impossible to parallelize something where the input to A depends on the result of B, which in turn depends on the result of C (and so forth, 2^10 times). This is important because much of the speedup in processing power and computation in the past few years has been because of advances in hardware and algorithms that allow us to better parallelize computation.

So, now we know that bcrypt is really cool, and hopefully if Twitter used bcrypt with a high enough cost factor, your password may be safe (if it was a good password with enough letters, numbers symbols, blah blah). The original paper is really cool, and if you're technically inclined, you should really check it out here.

One last thing (extra credit I suppose) are the salts. I introduced them in the beginning and they came up in the explanation of bcrypt. A salt is something that solves the problem of people not picking unique passwords, and that given enough time, you can pre-compute a vast number of passwords. Think about it this way: 'coolcat' will always hash to the same value when given the same inputs, so once 'coolcat' has been cracked as a password, you know that password pretty much forever. However, if we add some random element to your password 'coolcat', such as the number 1203 and then store that in plain text (not hashed), then we've added another level of difficulty to cracking your password (note: the random salt is added in by the program, not by the user). The attacker now needs to compute your password with the addition of '1203', but where it really hurts is when every user has a different random number attached to their password. Cracking 'coolcat1203' is completely separate from cracking 'coolcat285'. Now we can no longer pre-compute passwords (given a large enough random salt) because there are simply too many possible salts to use with your password. When someone gets your hashed, salted password, they need to crack that 'coolcat' password just for you. bcrypt uses a 128 bit salt, which means that the salt can be any number between 0 and 2^128.

Fifty Thoughts to the Dollar

Monday, February 18, 2013

Fun ways to turn down recruiters

Monday, February 4, 2013

Debt is Good

Saturday, February 2, 2013

Passwords and bcrypt