What is it and why should I care?
Note 1: I’ve actually wanted to finish this post for quite a while, but every time I tried, I would do some more research and find more rabbit holes to enter. At this point, I’m going to cut my losses, and post what I have now. Unfortunately, any solution discussed on topic inherently has weaknesses – it’s shades of gray even with good solutions.
Note 2: This post is about user passwords, not system account passwords. That is a different issue with different requirements, and therefore different solutions.
Password breaches have been unusually common over the last year or so, hitting many large and popular companies. What’s been particularly disheartening is the weak protections applied to passwords in most of these cases. Several have been protected with a simple md5 or sha-1 hash. We’ve been able to hash our password locally and compare it to the publicly available list and see our hash right there in the list. That’s not a pleasant feeling for anyone, much less those of us in security. But … what would we tell them to do better?
I’ve been looking at this issue for some time, and talking with various folks about what solutions they generally recommend. I’ve heard a handful of different ideas, some better than others, but all with weaknesses. At the end of the day, the best solutions recognize they are not perfect, and compensate for that. That is the course of action I would recommend.
What should I do about it?
There are a lot of different ideas about what to do in order to store passwords securely (just check the references :>). That being said, most fall under a basic collection of ideas(hash, salt + hash, salt + hash + key stretching, adaptive hash). While there are various opinions of what is best at this point, there is a reasonable amount of agreement on what’s no longer acceptable. With that, let’s look at a few bad ideas:
Plaintext Password Only (BAD)
This is a poor choice for obvious reasons. If your password data is breached, the attacker has no work to do in order to gain access to the records.
Encrypted Password Only (BAD)
This is a poor choice for a couple reasons. If you have a breach, an attacker has only to find the decryption key to gain access to the records. This may be a difficult task for external threat actors, but does nothing to mitigate internal attackers. Additionally, this is a poor privacy practice because the password is reversible, and the system can always see the password for the user, opening up another avenue for internal attackers to exploit.
Hashed Password Only (BAD)
This is a poor choice because it’s not enough protection. Rainbow tables have made cracking hashed passwords trivial, and effectively equivalent to plaintext from the attacker’s perspective.
Salted and Hashed Password Only (BAD)
This is a poor choice also because it’s not enough protection. Though salting is a good practice, today’s (late 2012) hardware makes simple hash salting a sufficiently weak protection that it’s no longer considered viable. Brute forcing a salted hashed password is a practical option for even a poorly funded attacker today.
So, we’ve seen some bad options … what are the possible solutions? Here are the available proposals that I’m currently aware of that _may_ be considered reasonable depending upon your environment.
Salted and Hashed and Iterated Password
In this option, you essentially perform a salted hash, then hash that, then hash that, then hash that … repeatedly for some large number of iterations (50,000X, 100,000X, 500,000X). The goal here is to slow down your password verification process. By doing this, you slow down your code when the user is logging in, but the theory is that login is a fairly rare request for applications, so you’ll only run it, say, once per user per day, but the attacker has to run it for every try. You make brute force ineffective again. While this theory is nice, verify whether the assumption is accurate for your environment. In this helpful spreadsheet, jOHN Steven points out that the cost of performing login on a sufficiently large site can certainly cost money in additional hardware, while it may not slow down an attacker as much as you think. This may or may not be a concern for your environment, but it’s certainly a consideration.
Adaptive Hash Functions
This option is very similar to the salted/hashed/iterated password option externally, but with varied internal operations, most calling the “iteration” bit a work factor. The currently popular implementations of these concepts are PBKDF2, bcrypt and scrypt. These options will have the same consideration as above with regards to hardware cost and attacker prevention.
Encrypted Adaptive Hash Functions
This is one of the options proposed in the threat model for secure password storage that jOHN put together (a fantastic piece of work). It is an idea which has some interesting strengths, but has not been heavily discussed yet in the industry (at least not openly). It does solve some of the problems that other solutions have, but not all. Additionally, there is a measure of extra effort and complexity added as part of the solution.
So, which solution should you choose? Obviously pick a reasonable one, but the answer is – it depends. In my mind, however, as much or more important than the specific encryption/hash solution you choose (assuming you choose a reasonable one) are the additional related tasks you should undertake.
You honestly need to consider your threat model and who you are protecting against and let the results of that inform the steps you take. If you’re protecting against a low-grade attacker, you have different concerns than if you are looking at a “hacking” collective, a disgruntled internal employee, or a nation-state. Each of these scenarios will have different requirements and will inform your protection scheme.
Know That You Will Lose
This is critical. Go into your planning with the assumed certainty of a breach (though I hope that never happens to you, I promise). What will you do when you see your name on CNN under the heading “120M User Accounts Stolen”? What will your story be? What will your process be?
You need to have a plan to deal with a compromise of your data. You should have a way to 1)protect your users whose accounts have been breached and 2)roll out the necessary updates to your system. All of this is a lot simpler if you’ve planned (and practiced) it beforehand. See the “workflow under attack” section of the threat model for more info.
Consider Multi-Factor Authentication
While multi-factor authentication doesn’t preclude you from appropriately handling all of the steps discussed above, it can significantly lower the likelihood of compromise of your customers. If it is possible in your situation, it’s an option to heavily consider.
In conclusion, while password storage is clearly no easy feat, and no solution is perfect, hopefully you see there are much better ways to handle it than the current common scenario. There are technical solutions for the specific storage mechanism, but that is just one part of the puzzle. You also need to take into account how you will deal with issues like your specific threat environment, planning for what to do when you do lose your password data, and additional protections like multi-factor authentication. There’s a lot to account for, but if you take each step into account, it’s possible to significantly improve the password storage in your applications.
Secure Password Storage Threat Model
jOHN Steven PSM code / docs