人脸识别的简单算法.docx-资源下载

人脸识别的简单算法.docx

1、人脸识别的简单算法Rowley-Baluja-Kanade Face Detector Author: Scott SannerContents Introduction Algorithm Data Preparation Training Image Scanning Testing Conclusion References Software IntroductionThe goal of this project is to implement and analyze the Rowley-Baluja-Kanade neural net face detector as descri

2、bed in 2 along with some enhancements for training and recognition proposed by Sung and Poggio as described in 3. The basic goal underlying both approaches is to train a neural network or other recognition system on a labelled database of face and non-face images. This face classifier can then be us

3、ed to scan over an image resolution pyramid to determine the locations and scaling of any faces (if present) and return them to the user.Overall, the task of face recognition can be extremely difficult given the wide variety of faces to match, the presence of facial hair, variations in lighting and

4、shadowing, and the possibility of angular, scaling, and dimensional variances. Consequently an ideal face detector should attempt to mitigate all of these problems while achieving a high detection rate and minimizing the number of false positives. As we will see in the latter requirement, there is a

5、 tradeoff between the positive detection rate and the false positive rate and the balance between the two will need to be evaluated by the individual user and application domain.Algorithm OverviewTo achieve the above goals for face detection, we use a general algorithm that is a straightforward appl

6、ication of data preparation, training, and image scanning. This algorithm is outlined below:Normalize Training Data: - For each face and non-face image: - Subtract out an approximation of the shading plane to correct for single light source effects - Rescale histogram so that every image has the sam

7、e same gray level range - Aggregate data into labeled data setsTrain Neural Net: - Until the Neural Net reaches convergence (or a decrease in performance on the validation set): - Perform gradient descent error backpropagation on on the neural net for the batch of all training dataApply Face Detecto

8、r to Image: - Build a resolution pyramid of the image by successively successively decreasing the image resolution at each level of the pyramid, stopping at some default minimum resolution - For each level of the pyramid - Scan over the image, applying the trained neural net face detector to each re

9、ctangle within the image - If a positive face classification is found for a rectangle, scale this rectangle to the size appropriate for the original image and add it to the face bounding-box set - Return the rectangles in the face bounding-box setData PreparationIn performing face detection with a n

10、eural net, a few face-specific and non-face-specific issues arise.In the realm of face specific issues, we do not want the background to become involved in face matching. Consequently, if person A is in two different settings we want to ensure that we perform as well as possible in detecting person

11、As face despite the background variation. If we were only to look at potential candidate rectangles for a face then we would receive interference from the corners which are more likely to consist of background than face pixels. Neural nets are especially susceptible to such errors since any consiste

12、ncies between data in the training set (no matter how plausible a predictor of face-hood in real life) will likely be detected and exploited. Thus, as 3 suggests, it is a good idea to mask an oval within the face rectangle to prune the pixels used in training in neural net. For true face images, thi

13、s usually guarantees that only pixels from the face are used as input to the neural net. For our implementation, we use the oval mask which can be seen in figure 3. The bounding rectangle for this mask is 18 x 27 pixels.Another face specific issue is that of pose or glasses. We want to recognize a f

14、ace invariant of whether a person is smiling, sad, wearing glasses, or not wearing glasses. Consequently it is important to construct a set of training data which covers a broad range of human emotions, poses, and glasses/non-glasses wearing faces. This ensures the greatest generalization when apply

15、ing the face detector to faces which have not been seen before. For our dataset, we use 30 faces and their left-right flipped versions with a variety of emotions and poses as contained in the Yale Face Database 1. It would be advantageous to have more faces and poses than this but the time limits of

16、 this project constrained the amount of time that could be devoted to photoediting (since the Yale Face Database is not in a directly usable format).One non-face specific issue is that of lighting direction. Neural nets are especially susceptible to pixel magnitude values and the differences between

17、 images illuminated from the left or right may be enough to make them appear as two different classifications from the perspective of the neural net. Consequently, there has to be some method for correcting for unidirectional lighting effects (even if only approximate). Additionally, not all images

18、will have the same gray level distribution or range and it is important to mitigate this as much as possible to avoid bias effects due to gray level distribution.For our dataset, we attempt to correct for unidirectional lighting effects as suggested by 2 by fitting a single linear plane to the image

19、. This plane can be computed efficiently through simple linear projection solving the equation X Y 1 * C = Z (where X, Y, and Z are the vectors corresponding to their respective coordinate values, 1 is a vector of 1s to compute the constant offset, and C is a vector of three numbers defining the lin

20、ear slopes in the X and Y directions and the constant offset). To compute C, we simply need to compute (X Y O * X Y O)-1 * X Y O * Z. These plane coefficients in C approximate the average gray level across the image under a linear constraint and thus can be used to construct a shading plane that can

21、 be subtracted out of the original image. Once the lighting direction is corrected for, the grayscale histogram can then be rescaled to span the min and maximum grayscale levels allowed by the representation.This was done for our face (and non-face) training data and an original subset of images are

22、 shown in figure 1 below:Figure 1: Initial Images.From figure 1, we then approximate the shading plane as shown below. Note that the second and third images in figure 1 show heavy directional lighting effects and that the shading plane in figure 2 accurately represents these effects.Figure 2: Shadin

23、g Approximations.Now, given the images in figures 1 and 2, we can subtract figure 2 from figure 1 and rescale the gray levels to the minimum and maximum range for our representation. We can then apply a mask to this image to remove background interference. This result is shown below in figure 3. Not

24、e in the following figure that the unidirectional lighting effects present in the original second and third images (figure 1) have now been removed and that unlike figure 1, all images in figure 3 have approximately the same gray level distribution. This normalization is extremely important to prope

25、r functioning of the neural network.Figure 3: Normalized and Masked Images.In addition to the face images, we also perform the same normalization on a set of non-face scenery images. Since we normalize all images during the face detection scanning process, it is important to train on normalized scen

26、ery images since the unnormalized set would be unrepresentative of those seen during training. A set of five of the 160 scenery images is shown below in figure 4. (Actually only 40 scenery images were used, but their left-right and upside-down versions were also added to the data set.)Figure 4: Non-

27、face Image Examples.Once all of the training data images have been normalized they are aggregated into labelled datasets and passed on to the training phase. Additionally, the normalization process occurs once more during the actual face detection process, i.e. all images rectangles are normalized b

28、efore classifying them with the neural net.TrainingGiven our mask size, we use a neural net (created and trained using Matlabs neural net toolbox) with approximately 400 input units connected directly to a corresponding pixel within the image mask, 20 hidden units, and 1 output unit used for predict

29、ion (yielding ideal training values of -0.9 for scenery and 0.9 for a face).The neural net is trained for 500 epochs (or until error increases on an independent validation chosen separately from the training set). The sum of squares error rate on the training set (blue) and the validation set (red)

30、are plotted below in Figure 5. Note that around epoch 50, the validation set error surpasses the training set error (as would be expected). However the validation set error never increases from a previous time step and therefore the network procedes to approximate convergence. This indicates that in

31、 some sense the training set is adequate enough to generalize to unseen instances.Figure 5: Training Error vs. Epochs.The final performance of the network on all of the face and non-face data is shown below in table 1. The network apparently performs much better at detecting non-faces which is proba

32、bly due to the bias toward non-face training images in the data set. However, this has the advantage of yielding a lower false positive rate than if the bias had been in favor of the face images instead.Face Detection RateNon-face Detection RateOverall Classifcation RatePercentage Correct86.7%98.1%97.7%Training Set Size60160220Table 1: Training Results.Now that the neural net has been successfully trained, it can now be used for classifying candidate face rectangles passe

邮箱/手机：
温馨提示：	快捷下载时，用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）。如填写123，账号就是123，密码也是123。
特别说明：	请自助下载，系统不会自动发送文件的哦；如果您已付费，想二次下载，请登录后访问：我的下载记录
支付方式：
验证码：	换一换

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？