A while ago, I had contact with the Django security team about a minor security issue. By increasing the password hashing workfactor, they opened up the old accounts to username enumeration.
Now, changing the workfactor has never (as far as I know) been considered a security risk. In fact, it is recommended to change the workfactor over time, as computers get more and more powerful.
Now one might wonder: why is something that is recommended by security advisers actually adding an attack vector?
In the beginning, software stored passwords as plain-text, and on login the entered password would be compared with the stored password. This worked, but had a major downside: administrators would know all the passwords on that system.
This was improved by hashing the password. Think of it as a one-way obfuscation step, or better: as a lot of non-reversible steps. Given a password, I can obfuscate that with the same hashing algorithm, and see if it matches what is stored in the database to see if you entered the correct password.
This opens up the system to brute-forcing. If I have the hashed password, and the algorithm used, I can perform that same algorithm on a lot of different inputs, until I find a successful match. Previously popular algorithms (md5) were troublesome in this regard, as they were fast to calculate. This allowed attackers to check a lot of different inputs in a short amount of time.
Newer algorithms, such as PBKDF2 and bcrypt acknowledge that computers get faster-and-faster, by having an extra parameter in the algorithm: the workload factor. As the workload factor increases, checking if a password matches takes more time.
[I intentionally left out salting in this description, although it is something you should do!] \section{The change} It is often advised to let the workload factor increase over the years, as computers get more powerful. Django does this as well, by increasing the default workload factor.
It all started when I thought about the implications changing the work factor has. Most implications are good. One of them is sort of, kind of not that great.
The change is actual quite trivial, and has been done quite often.
For example, on July 12th, 2014, in commit 67325669....
- iterations = 12000 + iterations = 20000
And then on January 17th, 2015, in commit c5125888....
- iterations = 20000 + iterations = 24000
And again on September 20th, 2015, in commit 593c9eb6....
- iterations = 24000 + iterations = 30000
Seeing the frequency of the change made, it seems like it is quite an innocent change. And it is. I would not normally worry about changes like this. They are a good thing. But what about the accounts that are 'left behind'? Accounts that were created before the change?
There are several things that can happen when increasing the number of iterations that must be performed when hashing the password. In a worst-case scenario, the number of iterations is not stored anywhere. This means that users can no longer log in. Obviously, that is not the case. That would be an unacceptable fallout, so we know something has been done to mitigate that.
Another option would be that they always keep the same hash. So if I registered when the number of iterations was 12000, my password would always keep that many iterations. At least, until I changed my password. This would be bad, because my password would not be stored as secure as the password of others.
So, to mitigate this, a third option is often implemented: as soon as the account has a successful login, the system knows the password, and uses that password to create a new hash with the new iteration account, storing your password just as securely as the password of all the other users.
This is a very good coping strategy that makes sure all the passwords are stored as securely as possible. A big win.
Yes, the timing differences. When an account has not had a successful login after the change in the iteration count, it takes shorter to verify its password. Now, any experienced security person will think "username enumeration", at least I'd hope so.
What happens, is that for a user account with a last login before July 2014, verifying the password only requires 12000 iterations. Verifying a password for an account with a very recent last login, 30000 iterations are needed to verify if a password matches. If (as Django did), you always use exactly the amount of iterations needed, verifying a new account takes 18000 extra iterations. That, as a difference, is easily measurable.
Here are some measurements based on checking if a user account exists using the time taken to check an (incorrect) password. Note that the test accounts are actual existing accounts, with the number corresponding to the number of iterations used to create the hash.
account | shortest time (s) |
test12000 | 114.7931 |
test20000 | 172.9259 |
test24000 | 197.2580 |
test30000 | 247.1929 |
notthere | 202.9819 |
(Interesting to note is that the 30000 iterations is there because Django 1.10 will use that as the default number of iterations.)
But what can we see here? We see quite clearly that the timing difference between the account 'test12000' and the 'notthere' account is shockingly more noticeable than the difference between 'test24000' and 'notthere'.
Now, username enumeration is kind-of a problem, but there are worse things than username enumeration. In general, unless the application deals with information that should be highly restricted (online banking), or is of an extremely sensitive nature (a dating site for married people, or the website of your local terrorist cell), I'm not convinced that username enumeration is that much of a deal.
However, if you're building a framework (instead of a website), it automatically becomes an issue. Because, one day, somewhere, somebody will use your framework for a purpose where username enumeration is a problem. Where security is not just a nice-to-have, but paramount. That is why writers of frameworks and libraries should let security take an important place in the process.
In Django 1.9.3, they added logic to harden the runtime of the password hashers. Consider for example a request to verify the password for account test12000, where the system is configured for 30000 iterations. What happens is that it first runs 12000 iterations to see if the password matches. If the password does not match, it runs another 18000 iterations, for a total of 30000 iterations.
After applying a provisional patch, I ran the measurements again. Even though the difference was still somewhat measurable, it was clear that the gap was closed a bit. After the release of Django 1.9.3, I decided to re-run the measurements.
account | shortest time (s) |
test12000 | 186.0561 |
test20000 | 186.8310 |
test24000 | 180.5122 |
test30000 | 216.5878 |
notthere | 209.9040 |
As you can see (besides the fact that my computer was less busy during this run), the timing results are now a lot closer. The difference between 'test12000' and 'test20000' is really close (they both follow the same path). However, 'test24000' and 'notthere' do show different results, because they take different paths through the code. The gap, though, is a lot smaller. We still have username enumeration, but it will now take more than 1 request per account.
One thing I saw in the preliminary patch is that the logic is (in my opinion) at a level too high. The patch works quite well, and it is probably the best way to implement it without changing the libraries providing the hashing functions.
If, however, we were allowed to change the implementation of the library providing the hashing, the implementation would be far easier. Ideally, it should be possible to tell the library the desired work factor, which it would use when the stored password used a weaker work factor.
Even better, it has two uses:
For incorrect logins, you harden the runtime, which is a huge benefit.
For correct logins, you not only have the fact that the login was correct, but at a very low price you also get a new hash with the higher work factor.
By having the logic for this in the password hashing library, you make sure that not all frameworks have to reinvent the wheel. Also, if changes are made to make the timing gap smaller in the library, all clients of that library benefit.
There are a few more things I'd like to say.
First of all, consider the following question. If you were to do black box testing for a system, and the system was on a brand new testing environment. Would you have detected this? I for sure would not have.
Second, both PBKDF2 and bcrypt are designed with a variable work factor in place, which you should increase along with the increase of computer power. This is sound advice, which I have heard several times. However, never have I heard somebody talk about username enumeration in this regard.
In the end, on the grand scale of things, I think this is a minor issue. Even for the Django applications that are/were affected, it most likely does not matter to them. It is more a theoretical than a practical vulnerability. On the other hand, frameworks and libraries (instead of websites and applications) should take theoretical vulnerabilities serious, if only because they cannot tell if it matters to any of the websites written using the framework.
In case you are interested, you should read the security advisory and the patch.