An Exploration of Hardware Architectures for Face Detection:Collection and Data Transfer Units
Collection and Data Transfer Units
Each CDTU represents the starting upper left corner of a search window in the image, and holds certain data for that window (such as image standard deviation and whether or not the window contains a face). The CDTUs are responsible for data movement throughout the system, and collecting and accumulating image data to be used in the computation. Each unit is composed of an adder/subtractor, a local bus controller, and a register file. The register file is small and holds the image data necessary for the computation as well as data on travel to the collection points. Each CDTU acts as a collection point, collecting and accumulating data for each feature rectangle. The register file provides data storage for the integral image value, the squared integral image value, the collected rectangle sums (supports up to four rectangles per feature), the accumulated stage sum, the standard deviation of the image for the search window represented by the CDTU, and temporary registers used to store data in movement and during computation. Additionally, the CDTU holds a flag bit (FB), which is reset only when the search window, represented by the CDTU, does not contain a face. The bit is set at the beginning of every computation (either a new image or a new feature size) and is reset by the MEUs at the end of a stage computation. For data movement purposes, FB is moved with the accumulated stage sum. A detailed block diagram of the CDTU is shown in Figure 83.14.
The CDTUs are controlled by the controller units, which determine the action of CDTU. The actions performed at each CDTU are shifts to all four directions, addition and accumulation of incoming pixel values and squared pixel values, additions and accumulation of incoming rectangle points, and being idle when they are waiting on the MEUs. Each CDTU action is determined by a state machine in each of the controller units, and a global opcode of 4 bits is sent to all CDTUs by their respective controller unit for the CDTU to perform the appropriate action.
Multiplication and Evaluation Unit
The MEUs are located to the far left of the array, one for each two rows, and are equipped with a multiplier. The multiplier is the slowest component in the design; it can therefore be pipelined accordingly to increase the overall clock frequency. The MEUs receive data from the CDTUs, starting from the rectangle values, the standard deviation of the image and lastly the accumulated stage sum. The rectangle sums are multiplied with the rectangle weights, and the standard deviation is multiplied with the feature threshold to determine the feature sum to be added to the accumulated stage sum.
If the computed feature is the last of a stage, the accumulated stage sum is compared with the stage threshold and the FB is reset if the stage fails. Else, both the accumulated stage sum and flag bit are shifted out to the toroidal link into the far right CDTU. The MEU starts the computation when signaled from the CU; when it ends the computation it signals to the CU to proceed with the next feature. The CU in the meantime stalls shifting in the CDTUs while waiting on the MEU to complete. When a stage is evaluated, the MEU sends a signal to the CUs to reduce the face counter if a stage fails, or keep it if it passes. The face counter is located in each of the CUs, and it will be described in detail in the next section.
Each of the MEUs interfaces via an 8-bit bus to an external memory, which holds the training data necessary for the feature computation and in a FIFO manner reads feature data from the external memory to be used for computation. The MEU block diagram is shown in Figure 83.15. Each MEU is connected via a multiplexed bus to two CDTU rows. As such, the image is essentially searched on alternate rows rather than every successive row. The CUs can select which row they can propagate to the MEUs, and the row that is not evaluated is simply propagated around the toroidal link to maintain order of data flow. The search windows alternate every row, following the pattern used in the algorithm [7].
Control Units
The system is synchronized via 12 identical CUs, each of which generate control signals to drive the CDTUs and the MEUs. Although identical, the CUs are spread in the entire system to reduce the size of each CU’s control region. Each unit consists of a finite state machine controlled by the training data, feedback signals from the MEU, and two counters; a global counter and a face counter. The global counter controls the flow starting from the computation of the integral image and rectangle collection.
The face counter is also used to control the operation and is used for early termination in case no search window in the image passes beyond a stage. The counter is updated on every MEU evaluation at the end of each stage computation. Since MEUs operate in parallel, and at most 240 MEUs can produce the outcome of a stage in a single cycle, a thermometer decoder with a priority encoder at each CU (receiving the 240 FB from the MEUs) determines the number of search windows that pass a certain stage. The thermometer decoder value is then used to update the face counter value. When an entire stage is computed the face counter is checked whether it has reached zero or not, and if it has, the CU simply outputs a no-face signal. Otherwise, it resumes with the computation of the next stage.
Essentially, the CUs are responsible for ensuring that during different computation stages, the CDTUs and the MEUs are in the right computation stage and synchronized to each other. The computation
states for each operation are listed in Table 83.1. For each computation state, each of the units performs certain actions. The action is determined in the CUs, and the global 4 bit opcode is transmitted every cycle to all CDTUs and MEUs. Each of the units decodes the opcode and performs the operation associated with each action. The computation is overviewed in detail in the following section.
Comments
Post a Comment