Hardware Implementations: Challenges and Issues
Hardware Implementations: Challenges and Issues
There are several design challenges in designing a hardware face-detection system. There has been extensive research in the field, ranging mostly in the software domain [1–4,7–9,11–18]. In recent years, however, a few attempts at hardware implementations that implement face detection on multiple FPGA boards and multiprocessor platforms using programmable hardware [2,3,19,20] have been proposed. Many of the proposed hardware solutions though, are not compact and do not meet mobile environments. Additionally, these implementations require specific interfacing criteria, and do not fill the plug-and-play approach of modern hardware platforms. For example, the FPGA implementations utilize up to nine boards or utilized a general-purpose processor assisted with a coprocessor which performs certain algorithmic computations [2,19]. Some of the attempted hardware implementations generally feature algorithms that are not as effective as traditional software approaches, such as competitive feature approach [1], which does not perform as well as a neural network or other state-of-the-art algorithms. An exception is the implementation on an embedded platform using neural networks in Ref. [16]. That implementation though is not purely on hardware; rather than on a reconfigurable multiprocessor platform integrated with embedded software, which achieves 10 fps.
Other implementations utilizing embedded hardware have surfaced recently. In Ref. [20], the authors explain an architectural methodology for mapping algorithms in hardware, and using an embedded development platform (Xilinx ML-310 Board) they show that they can achieve a frame rate of 12 fps, which is acceptable for certain applications. The authors use a shape-based approach where edge detection is used to detect elliptic shapes, and further template matching is used in conjunction to detect the presence of a face in the input image. Such embedded platforms can be built as add-on expansion cards to a general-purpose computer, or as individual components to perform detection. An embedded face detection algorithm, which achieves up to 4 fps is presented in Ref. [21]. The algorithm uses a neural network approach and is built in a cell-phone embedded processor. The application targets low-performance and ultra-low power cameras, hence the small frame rate is acceptable. As evidenced from the existing work, the need for a stand-alone system, which meets real-time frame rate of 30 fps and can interface with existing video interfaces is beyond necessary. Such a system can either be mapped on an FPGA or as an ASIC, as it can be placed on a camera, as a coprocessor, as part of an embedded platform, or as an entirely stand-alone system.
Video frames from modern digital cameras are of high quality, and most cameras are equipped with lighting and image enhancement (IE) features [22]. This reduces the role of the IE stage as better training algorithms and higher image quality can help alleviate variations in images. Still, certain parameters about the environment need to be taken into consideration, either through training or through image processing. Additionally, digital camera interfaces provide a framework for better generation of search window images as well. The detection stage therefore becomes the point of emphasis in this chapter.
The challenges in designing a hardware face detection system are several. In addition to meeting real-time frame rate, the system must be energy efficient and reliable. First, the hardware boundaries need to be explored. Ideally, the system should interface to a standard video input interface, and export the detected faces as part of the video or as image coordinates. The video interface may or may not be a part of the system; if it is not, then the system has to be designed with the interface in mind. For example, the number of received pixels as well as the frequency of the video signal is an issue that needs to be taken into consideration. Next, depending on the input video frame size, the system memory has to be able to hold an entire image frame to maximize parallel searches. In algorithms where the IPG is used to generate search window images, the system memory must be designed in such a way so as to maximize the flow of data and consequently increase the parallelism of the system. Last but not least, the algorithm
computation requirements vary with the chosen algorithm, and algorithms with high accuracy in software do not necessarily perform as accurately in hardware at real-time frame rate, due to the complexity of the computations. For example, an algorithm that consists of several floating-point computations and operations such as divisions and square roots might detect faces at a high rate, however when mapped in hardware, it will require extensive resources to detect the faces with a high accuracy and fast frame rate. Similarly, in floating- and fixed-point computations overflow results in loss of accuracy as well. As such, the choice of an appropriate algorithm is crucial. As in other image-processing applications, data move- ment and memory accesses dominate the computation, and as such the hardware design needs to be optimized for parallel data movement, and intelligent memory partitioning. The choice of the algorithm impacts the design typically in the detection stage; image manipulation is still necessary regardless of the chosen algorithm, and memory partitioning depends on the algorithmic data flow.
Another issue is handling of training data; whether the system can be trained in hardware or whether it uses predetermined training data, the data need to be stored and accessed in a way that it does not delay the computation. The choice of the algorithm obviously has a lot to do with the organization of the system components, as the access pattern of the image data changes with the algorithm. In image-based compu- tations, image data can be accessed in any order and hence an order, which exploits parallelism, can be designed. However, data access in feature-based algorithms depends on the features used, and the order of operations is determined by the training data.
In this chapter, we explore both types of algorithms. First, we will look into the implementation of a neural network-based face detection in hardware, which is an image-based approach. Then we will approach the problem of face detection using a feature-based method, AdaBoost. Both methods provide certain advantages and disadvantages in designing hardware platforms, which we explore in detail in this chapter. First we start by presenting a neural network-based face detection, where the targeted implementation is both an FPGA and an ASIC.
Comments
Post a Comment