How Password Hashing Works

A practical guide to bcrypt, SHA, Argon2, and the algorithms that protect your passwords

Why Hash Passwords?

Imagine you're running a web application with 10,000 users. Every one of them trusts you with their password. Now imagine an attacker gets access to your database. If you stored those passwords in plaintext, every single account is instantly compromised -- and since people reuse passwords, their email, bank, and social media accounts might be too.

This isn't hypothetical. It has happened over and over:

  • RockYou (2009) -- 32 million passwords stored in plaintext. The full list was leaked and became the de facto wordlist for password cracking tools.
  • LinkedIn (2012) -- 6.5 million password hashes leaked, using unsalted SHA-1. Most were cracked within days.
  • Adobe (2013) -- 153 million accounts exposed, using reversible encryption (3DES-ECB) instead of hashing. The identical ciphertexts revealed which users shared the same password.

The solution is hashing -- running the password through a one-way mathematical function that produces a fixed-length output. You store the hash, not the password. When a user logs in, you hash what they type and compare it to the stored hash. If they match, the password was correct.

The key property of a good hash function is that it's a one-way street: easy to compute the hash from a password, but computationally infeasible to recover the password from the hash.

Hash vs. encryption -- a common confusion. Encryption is reversible: if you have the key, you can get the original data back. Hashing is not. There is no key, no way to "decrypt" a hash. That's exactly what makes it suitable for passwords -- even if an attacker steals the hashes, they can't reverse them.

The Rise and Fall of MD5

MD5 (Message Digest 5) was created by Ronald Rivest in 1991 and published as RFC 1321. It produces a 128-bit (16-byte) hash, typically displayed as a 32-character hex string:

$ echo -n "password123" | md5sum
482c811da5d5b4bc6d497ffa98491e38

MD5 was fast, simple, and became the default choice for everything -- file checksums, digital signatures, password storage, data deduplication. For over a decade, it was the Swiss Army knife of hashing.

Then it fell apart.

In 2004, Chinese cryptographer Xiaoyun Wang and her team demonstrated practical collision attacks against MD5 -- they could produce two different inputs that generated the same hash. This wasn't just a theoretical weakness; it meant MD5 could no longer be trusted for any security purpose.

By 2008, researchers used MD5 collisions to forge a rogue SSL certificate, proving the attack had real-world consequences. The writing was on the wall.

Yet MD5 persists. You'll still find it in legacy systems, old config files, and tools that haven't been updated. For non-security uses like file checksums or deduplication, it's still functional -- it's fast and collisions in random data are rare. But for passwords, digital signatures, or anything security-critical, MD5 is broken and should never be used.

The SHA Family

The Secure Hash Algorithm (SHA) family was designed by the NSA and published by NIST. It's gone through several generations:

  • SHA-0 -- Published in 1993 and withdrawn almost immediately due to an undisclosed flaw. It never saw real adoption.
  • SHA-1 -- Produces a 160-bit hash and was the standard for years. In 2017, researchers from CWI Amsterdam and Google jointly demonstrated a practical collision (the "SHAttered" attack). SHA-1 is now deprecated for security use.
  • SHA-2 -- Includes SHA-256 (256-bit) and SHA-512 (512-bit). These remain cryptographically strong with no practical attacks known. SHA-256 is used extensively in TLS, Bitcoin, and file integrity verification.

On Linux systems, you'll see SHA-based password hashes in /etc/shadow with distinctive prefixes:

# SHA-256 hash (prefix $5$)
$5$rounds=5000$saltsalt$DLBf1mGMW3wMrGq...

# SHA-512 hash (prefix $6$)
$6$rounds=5000$saltsalt$U6Yv3E8aHvXFqA7...

The $5$ and $6$ prefixes tell the system which algorithm to use, followed by the salt and the hash itself.

The problem: SHA is fast

For file checksums, speed is a feature. For passwords, it's a catastrophic flaw.

A modern GPU can compute billions of SHA-256 hashes per second. That means an attacker with a decent graphics card can try every possible 8-character alphanumeric password in a matter of hours. Throw in a precomputed rainbow table (a massive lookup table of hash-to-password mappings) and even long passwords fall quickly if they're not salted.

This is the fundamental insight that led to the next generation of password hashing: for passwords, you want the hash function to be slow.

Enter bcrypt -- Designed to Be Slow

In 1999, Niels Provos and David Mazieres published "A Future-Adaptable Password Scheme," introducing bcrypt. Based on the Blowfish cipher, bcrypt was designed from the ground up for one purpose: hashing passwords.

The key insight was the adaptive cost factor. Unlike SHA, which is designed to be as fast as possible, bcrypt lets you control how slow it is. As hardware gets faster, you increase the cost factor and the hash takes just as long to compute as it did on last decade's hardware.

How bcrypt works

At a high level, bcrypt uses a modified version of the Blowfish key schedule called Eksblowfish ("expensive key schedule"). The algorithm:

  1. Derives an encryption key from the password and a random salt
  2. Uses the expensive key schedule to set up the Blowfish cipher state -- this is the slow part, and it's repeated 2cost times
  3. Encrypts a fixed string ("OrpheanBeholderScryDoubt") 64 times using the resulting cipher
  4. Outputs the salt and encrypted result as the hash

The cost factor explained

The cost factor (also called "work factor" or "rounds") is an exponent. A cost of 12 means 212 = 4,096 iterations of the key schedule. Each increment doubles the work:

Cost Iterations Approximate Time
10 1,024 ~50-80ms
12 4,096 ~200-300ms
14 16,384 ~800ms-1.2s
16 65,536 ~3-5s

A cost of 12 is the common default as of 2026 -- fast enough that users don't notice the delay during login, slow enough that brute-force attacks become impractical.

Reading a bcrypt hash

A bcrypt hash string contains everything needed to verify a password:

$2b$12$WApznUPhDubN0oeveSXHp.TgT2Xx4hECKMw/3PstvsMgiAMEzNbKy
│  │  │                       │
│  │  │                       └─ hash (31 chars)
│  │  └─ salt (22 chars, Base64-encoded)
│  └─ cost factor (12 = 2^12 = 4096 rounds)
└─ algorithm version (2b = current standard)

Notice that the salt is embedded in the hash string. With SHA, you have to manage salts yourself. With bcrypt, everything is self-contained -- one string to store, one string to verify against.

The 72-byte limit

Bcrypt truncates passwords at 72 bytes. In practice, this rarely matters -- a 72-character password has more than enough entropy. But it's worth knowing, especially if you're hashing passphrases or pre-hashed values.

The New Generation: scrypt, Argon2, yescrypt

Bcrypt made password hashing slow, which was a major step forward. But attackers adapted. They built specialized hardware -- GPUs with thousands of cores, and even custom ASICs -- that could run bcrypt in parallel. The next evolution was to make password hashing not just CPU-hard, but memory-hard.

The idea: if the algorithm requires a large amount of RAM to compute, you can't just throw more cores at the problem. Memory is expensive, and each parallel instance needs its own chunk of it.

scrypt (2009)

Created by Colin Percival for the Tarsnap backup service, scrypt was the first widely-known memory-hard password hashing function. It takes three parameters:

  • N -- CPU/memory cost (must be a power of 2)
  • r -- block size (controls memory usage per block)
  • p -- parallelization factor

scrypt gained visibility through its use in several cryptocurrency proof-of-work systems (Litecoin, Dogecoin), though those use deliberately low memory parameters for fast verification -- the opposite of what you want for password storage.

Argon2 (2015)

Argon2 won the Password Hashing Competition (PHC) in 2015, a multi-year open competition to find the best password hashing algorithm. It comes in three variants:

  • Argon2d -- data-dependent memory access, fastest, but vulnerable to side-channel attacks
  • Argon2i -- data-independent memory access, resistant to side-channel attacks, but slower
  • Argon2id -- hybrid of the two, recommended for most use cases

Argon2id is the recommended variant. OWASP suggests these minimum parameters:

  • Memory: 19 MiB (19456 KiB)
  • Iterations: 2
  • Parallelism: 1

Argon2 is defined in RFC 9106 and is increasingly supported across programming languages and frameworks.

yescrypt

While Argon2 won the PHC, the Linux world went a different direction. yescrypt, designed by Solar Designer (who also created the John the Ripper password cracker), builds on scrypt with additional features designed specifically for system password hashing.

Since 2021, yescrypt has become the default for /etc/shadow on major Linux distributions:

  • Debian 11+ (Bullseye)
  • Ubuntu 22.04+ (Jammy)
  • Fedora 35+
  • Arch Linux

You can identify yescrypt hashes by the $y$ prefix in /etc/shadow:

# yescrypt hash
$y$j9T$F5Jx5fExrKuPp53xLKQ..1$X3DX6M94c7o.9agCG9G317fhZg9SqC.5i5rd.Yv6pfA

Why did Linux choose yescrypt over Argon2? Primarily compatibility and tuning flexibility. yescrypt handles the specific constraints of system password authentication well -- it needs to work during early boot, with limited memory, and across a wide variety of hardware.

Choosing the Right Algorithm

Scenario Recommended Algorithm Notes
New web application Argon2id Best available. Fall back to bcrypt if your framework doesn't support it.
Linux system passwords yescrypt It's the default on modern distros. Just use it.
Legacy system you can't change bcrypt (cost >= 12) Battle-tested, widely supported, no known practical attacks.
File checksums / integrity SHA-256 Fast and strong for non-password use cases.
Anything security-critical Never MD5 or SHA-1 Both are broken for security purposes.

Migrating from weak hashes

If you're stuck with MD5 or SHA-based password hashes, you don't need to force all users to reset their passwords. The standard approach is re-hash on login:

  1. User logs in with their password
  2. Verify against the old hash (MD5/SHA)
  3. If it matches, re-hash the password with bcrypt/Argon2 and store the new hash
  4. Delete the old hash

Over time, most active users get migrated automatically. For accounts that never log in again, the old hashes remain -- but those are also accounts that are unlikely to be targeted.

Common Mistakes

Using a hash without a salt
Without a salt, identical passwords produce identical hashes. An attacker can precompute a rainbow table and crack thousands of passwords instantly. Always use a unique, random salt per password. (bcrypt and Argon2 handle this automatically.)
Using a fast hash for passwords
MD5 and SHA were designed for speed. That's great for checksums, terrible for passwords. A modern GPU can compute billions of SHA-256 hashes per second.
Hardcoding or reusing salts
A salt that's the same for every user is barely better than no salt at all. Each password needs its own unique, randomly-generated salt.
Setting the bcrypt cost too low
A cost of 4 or 6 might have been reasonable in 2005. In 2026, it offers almost no protection. Use at least 12.
Storing the salt separately
bcrypt and Argon2 embed the salt in the hash string. There's no need to store it in a separate database column -- and doing so just adds complexity and room for mistakes.
Rolling your own crypto
Inventing your own hashing scheme (double-hashing, custom salt mixing, XOR games) almost always produces something weaker than established algorithms. Use a well-tested library.

Try It Yourself

The best way to understand password hashing is to see it in action. Head over to the Password Hash Generator and try these experiments:

Experiments

  1. SHA-256 without a salt is predictable. Type a password, select SHA-256, and leave the salt field empty. Note the hash. Clear the password, type it again. The hash is identical -- an attacker with a precomputed table could look this up instantly.
  2. Adding a salt changes everything. Now click Generate to add a random salt. The hash changes completely -- and it's now an HMAC-SHA-256, which can't be found in any rainbow table. Generate a different salt with the same password and you'll get an entirely different hash.
  3. bcrypt has a built-in salt. Switch to bcrypt and type the same password. Note the hash. Clear and type it again. The hash is different every time -- because bcrypt generates a new random salt for each hash. Both hashes are valid for the same password, but an attacker can't use precomputed tables.
  4. Feel the cost factor. With bcrypt selected, try cost 4 -- the hash appears instantly. Now try cost 14 or 16. You can feel the delay. That delay is your security -- multiply it by billions of guesses and you'll see why bcrypt works.