Thursday, November 1, 2007
Face Recognition Algorithms
Face recognition is an important topic in computer science these days, and there is a lot of active research. A computer finding a match between a driver's license photo and a ID picture from a database. This is one situation where image-matching software is pretty successful, since both types of ID photos are taken under controlled circumstances and with pretty good resolution. To simplify the discription of how this software works, let's look at this "best case": head-on views with good lighting and good resolution. We'll save the sketchy stuff for later.
We start with a database of many photos, and a picture P of a "suspect". We want to find the photo in the database which best matches P.
First of all, we have to "digitize" both photos into a comparable format. Let's assume the database photos are digital images -- say 800 x 600 dots or "pixels". We'll want to take our photo and scan it into digital format of 800 x 600 pixels as well. We have to decide whether to use color or greyscale for our pictures. Color often complicates things too much without adding a lot of useful distinguishing features. (You'd surely recognize a familiar face whether the photo is colored or not.) If either the database or the photo P is not in color, then clearly we can not use color.
OK, we now have a database and a photo P all digitized to 800 x 600 dots. For simplicity let's say we are using greyscales: from 0 (black) to 255 (pure white). We could naively compare a photo in the database with P, dot for dot, and take the variance: the sum, over all points, of the squares of the differences (between the two pictures) of the greyscale values at each point. (we square to make everything positive, which avoids "cancellation"). This variance is a measure of the total difference between the two pictures, but it is generally worthless, for two reasons.
1. Even if the two actual faces are virtually identical, if one is shifted over even a tiny bit, or slightly enlarged or distorted, lots of pixels won't match up. Small differences in contrast or shading or lighting can result in very large variances, even though the facial features (shapes) are nearly identical.
2. Checking through half a million pixels is slow.
Now we have to use some cleverness and some neat math. By researching faces and perception, we can come up with a list of many qualities that distinguish a face. Here are some: distance between the eyes; distances from the left eye (or right eye) to other facial features such as the corners of the mouth and hinges of the jaw; overall shape of the head using various measures; angles relating to the eyes, nose and ears; etc. etc. Some of these features can be measured by processing the pictures in various ways: accentuating contrast or finding edges, corners or curves. These can all be described as mathematical operations, some using ideas from calculus such as gradients and convolutions.
Anyway, workers in the field have come up with lists of facial features that can be isolated and quantified (given a number). For the sake of argument, let's say there are 300 significant features. We can fix some ordering, and, for any particular image, list them:
This list is sometimes called a face vector
So, instead of half a million pixels, we have associated to a picture a single "vector" with just 300 numerical "components" or coordinates. If the choice of features is a good one, matching the face vector of our photo P with the face vectors of the images in our database ought to give us good results.
But there is more. Some features are more important than others in distinguishing faces. It is hard to predict in advance what these features are, so we have algorithms for "training" or fine-tuning our comparisons of face vectors. This can be done ahead of time, since it just involves our database.
Start by looking at our collection of face vectors: one for each image in our database. We can take all of them and average them together. There are a number of ways of doing this, but the simplest is just to add them all up, component-by-component, and divide by the total number. As a simple example, suppose <>, <> and <> are three face vectors. Then their numerical average would be the vector
The average of all face vectors in our database is called the average face. We now search through all of the face vectors in the database to find the one which has the largest variance (sum of squares of differences in coordinates) from the average face. The is called the first eigenface. It somehow picks out those qualities in our descriptions of the faces in the database for which there is the most variation. We now exclude the first eigenface and look at the remaining face vectors and repeat this process.
[Well, there are some technicalities here which I'm going to gloss over. The idea is to produce a collection of eigenfaces which are orthogonal or "perpendicular" to each other. This has a precise description in a field of mathematics called linear algebra. Suffice it to say that the procedure for producing this list of eigenfaces is a well-studied one, for which there are computer programs to do the job. By the way, the word eigen -- pronounced "eye'-ghen" -- is German for "characteristic"; eigenvalues and eigenvectors are concepts of great importance in linear algebra.]
So, we prepare in advance, from our database, a list of face vectors called eigenfaces, which are faces that basically vary from the biggest difference from the average face to the smallest difference from the average face.
OK, here's the last piece. If we take any face vector, we can attempt to express it in terms of the eigenface vectors. This is called "projecting onto the eigenspace". One again, finding the best way of doing this is a well-known technique in linear algebra. Writing a face vector in terms of eigenfaces, or projecting it, results in a sequence of numbers called that vector's eigen coordinates. There may be up to 300 of them, with the ones at the beginning the most important, and the ones at the end of less importance (since they represent facial qualities which vary very little in the database). Sometimes only the most significant eigen coordinates are used.
When the database is prepared, or when new photos are added to it, a "training session" recalculates the average face vector, the collection of eigenfaces, and the assignment of eigen coordinates to each photo in the database.
Here's how we do the matching. We take our unknown photo (already digitized), find its face vector (by constructing the list of 300 quantified features). We subtract from the face vector the average face from the database, then find the eigen coordinates of our photo. We now go through the list of eigen coordinates of database photos, and try to find a close match with the eigen coordinates of our unknown photo. How this match is made varies from system to system. Sometimes it is a simple variance check; in others it is done using various statistical correlation techniques. There may be special weights put on certain coordinates etc.
There is no "standard" face recognition or matching algorithm but rather many different ones, some more effective for certain types of images or databases than others. There is a lot more interesting math here than I've been able to convey -- I don't work in this field -- but it's a subject being actively researched every day.