As you probably know if you've worked on certain algorithms such
as hash tables, a **hash function**
takes pieces of data and **"randomly" maps them to integers**. Generally, the
resulting integers, called *hashes* or *hash codes*, are of some
specified fixed number of bits. For example, Java collections use 32-bit hash codes.

In cryptography applications, we often need a so-called **secure hash function**.
Secure hash functions are generaelly a subset of hash functions that
fulfil at least two extra criteria:

- it must be
**computationally impossible to reverse the mapping**, that is, go from a hash code to a message or piece of data that would have generated that hash code; - it must be
**infeasible for a**: that is, for two messages to be found that have the same hash code.*collision*to occur

In order to fulfil these criteria (or at least, as a by-product of needing to fulfil these criteria), secure hash functions generally have these characteristics:

- they are
**slower to compute**than the hash codes typically used to key hash maps; - they are
**wider**(i.e. have more bits) than weak hash codes.

Secure hash codes are typically 128 bits wide at the very least; compare that, for example,
to the 32-bit codes returned by Java's `hashCode()` method, or the 64-bit
hash codes recommended for key-less hash maps in *Numerical Recipes*.

Secure hash functions actually have various applications. A very
common case is verifying the **integrity** of data. When we send some data, we append
a hash of that data; on the receiving end, we re-hash the received data and check
that the computed hash equals that sent; if any of the data has changed then
(with overwhelming probability), the computed hash value will no longer match
the original. Another case is where we need to **authenticate** some data,
i.e. produce a kind of integrity check that only a party with a given private key
could produce. (In this case, the general solution is to combine a hash code with encryption.)

In other cases, a secure hash function is useful to **represent** a particular item of data.
For example, for the purpose of checking passwords, we need only store a *hash*
of that password. When somebody enters their password, if the computed hash of what
they entered matches the hash stored in the password file/database, we assume they
knew the password.

This scheme, sometimes called **compare by hash** (CBH) can be used to search
for duplicates of data on a hard drive or for synching data between multiple machines.
Similarly, another example are the databases that various law enforcement agencies keep
of known "disapproved" files that they want to search people's hard drives for.
In these applications, keeping a database of the actual file contents, and/or transmitting
and comparing those entire contents, would be impractical. Instead, only hashes
are stored and compared.

More broadly, secure hash functions are useful in a variety of cases where
we need a **trapdoor function** (i.e. one that cannot feasibly be reversed),
especially where we need one with a limited or fixed-size result.

On the next page, we look at two issues:

- how to compute a secure hash (message digest) in Java;
- which hash algorithm to choose: we give an overview of the different options (although in reality, we see that there aren't too many to choose from!).