An Exploration of Hardware Architectures for Face Detection:Algorithm Impact—Hardware Face Detection using Ada Boost
Algorithm Impact—Hardware Face Detection using Ada Boost
In section 83.3, we have seen how an image-based approach is implemented in hardware. Although the neural network itself is ideal for hardware implementation, the algorithm is not the most efficient one primarily because the detection stage has to look through the entire image for a face, which might occupy just a small area of the image. Consequently, several time and computation power is wasted in looking over areas that have no faces. Additionally, the IPG is a computationally intensive process. Viola and Jones [7] have proposed a feature-based algorithm, which uses a chain of classifiers to determine whether a location within an image contains a face. The approach uses the concept of weak and strong classifiers; a strong classifier consists of a series of weak classifiers, and the outcome of all the weak classifiers determines the outcome of the strong classifier. The approach essentially attempts to scan the entire image for locating face candidates. Locations that fail to produce evidence that there might be a face in the image are discarded and eliminated from further search. As the locations which have candidate faces decrease through the search methodology, the computation focuses only on locations which present strong evidence that a face exists. Eventually, if a location passes through a series of cascaded classifiers, it is said to have a face. The classifiers used in the algorithm are features; a feature is represented as a set of black and white rectangles located at specific offsets within a search window. The outcome of the feature equals the sum of the pixels which lie under the black rectangle, minus the sum of the pixels under the white rectangle. The feature size determines the size of the image window being searched.
As such, instead of scanning the image at various scales to find images which do not fit the search window, the feature itself is scaled up; hence the search window size increases. While this might seem very intensive however, the algorithm benefits from the use of the integral image. The integral image is simply a transformation of the original source image, to an image where each pixel location holds the sum of all the pixels to the left and above of that location. Note that while it might appear that the integral image is one row and one column larger than the source image, the first row and the first column are zeroes, making the integral image size same as the source. The integral image as well as an example feature and sample rectangles is shown in Figure 83.10.
The main advantage of using the integral image is the ability to compute the sum of a rectangle in a rapid manner. Consider the source image shown in Figure 83.11 (left). Assume we are computing the sum of the pixels under rectangle D. As shown in Figure 83.11 (left), the computation is simply two additions and two subtractions of the four corner points of the rectangle. Hence, regardless of the search window size, the computation remains the same, as only four values per rectangle are necessary to compute the value of each feature. Additionally, given that each feature and its rectangles are fixed (determined during training) all we need is essentially the offset coordinates from the starting corner of the current search window to retrieve the integral image values necessary, each offset defined by the values dx and dy as shown in Figure 83.11 (right). The offset coordinates are part of the training set, where each feature is associated with a list of the feature’s rectangles and the four pairs of (dx, dy) offsets necessary.
The computation starts by first computing the integral image and the squared integral image. The squared integral image is the same as the integral image, using the squared pixel value rather than the original pixel value and is necessary to compute the variance and the standard deviation of the image. After both images are computed, we proceed to the feature computation. The features are computed in stages. Each feature is evaluated over a search window of equal size as the feature. In
Viola and Jones’ implementation, the starting feature size is 24 × 24 pixels; we will use this size as
the reference starting feature size. The location of the rectangles within each feature is predetermined from training, hence to evaluate each rectangle we need the offset dx and dy values from the starting coordinate of each feature. The first search window is set over the origin of the image, and computation starts. Features are grouped into stages, with each stage acting as the strong classifier and each feature acting as the weak classifier. Each stage has a number of features. The sum of a feature is computed by using the rectangle sums, and multiplying each rectangle sum with predetermined weights. Prior to evaluating each feature, the threshold is adjusted to take into consideration the standard deviation of the image, to compensate for brightness and contrast variations. The standard deviation can be computed from the image variance, as shown in Eq. (83.1). This is computed by evaluating the integral and integral squared images over the search window size, in the same manner that a feature rectangle is evaluated, using the four corners of the search window. It is only needed to be done once however, and all subsequent features evaluated over that search window can use the computed standard deviation value (σ), which is computed by computing the square root of the variance as follows:
The standard deviation is multiplied with the predetermined threshold to obtain the compensated threshold. The sum of all weighted feature rectangles is used to determine the feature sum and if the weighted rectangle sum exceeds the compensated threshold then the feature sum is set to a predetermined value. Else, is set to another predetermined value. All feature sums are accumulated to compute the stage sum. At the end of each stage, the stage sum is compared with a predetermined stage threshold, and if it exceeds the threshold then the search window is set to contain a face candidate. Otherwise, it is considered a nonface, and omitted from the rest of the computation. When all search windows finish for the 24 × 24 feature, the feature is scaled by a predetermined factor, to search windows of larger sizes. The computation is repeated again, using new training values set for the higher scale. Using this approach, the benefits are twofold. First, if an image does not contain a face, it will likely fail to pass one of the first one or two stages, thus computation will stop with a nonface output. Second, there is no need to perform image operations to scale the image, thus limiting loss of image data; instead, the feature is scaled, and all that changes are the offsets to each rectangle corner. At the end of each scale, if there is a face in the picture, the search window that contains the face will have passed all stages and marked as faces; otherwise, at some point in the computation the search window will have failed a stage and the computation would end. The entire computation is outlined in Figure 83.12.
The training set necessary for the algorithm computation consists of a list of stages, each stage with a list of features. The stages are associated with a predetermined stage sum. Each feature is associated with two, three, or four rectangles (some variations of the algorithm use a higher number of rectangles but for the purposes of this chapter we will assume that the maximum number of rectangles is four). Each rectangle’s offset coordinates from the feature upper left corner are also given in the training set, along with each rectangle’s associated weight. Each feature is associated with a threshold value, to be used in computation of the compensated threshold (with the standard deviation of the image over the search window) and two predetermined feature sums; one in case the computed rectangle sum exceeds the compensated threshold and one in case the sum is lower.
In the proposed architecture, the computation is performed over an array grid, where each node in the array holds the integral image and integral squared image data, and serves as computation and data transfer unit as well. The array provides the ability to perform all rectangle operations for the entire image array, each rectangle computed in parallel. The proposed architecture is explained in detail in the next section.
Comments
Post a Comment