Power-Aware Architectural Synthesis:Low-Power System Synthesis

Low-Power System Synthesis

System synthesis has its roots in hardware–software cosynthesis. Early hardware–software cosynthesis algorithms took, as input, a high-level description of the application’s required functionality, descriptions of available hardware, e.g., instruction processors and application-specific integrated circuits (ASICs), as well as performance and power requirements. The hardware–software cosynthesis algorithm automatically produced a design for the desired application, often consisting of application-specific and general-purpose processors mounted on a printed circuit board. The main focus of most hardware–software cosynthesis algorithms is partitioning applications between instruction processors and application-specific cores/ICs. SoC synthesis algorithms target hardware–software systems implemented on single ICs. Although their functionality overlaps with hardware–software cosynthesis algorithms, SoC synthesis algorithms also place great weight on synthesizing (heterogeneous) communication buses or networks. In addition, some consider the interaction between architectural and physical design to solve the entire SoC synthesis problem better.

Figure 17.7 illustrates a system synthesis optimization flow. Although this flow is representative, some flows, e.g., those using constructive algorithms, may differ. Initially, a description of the algorithm to be implemented is provided in a high-level language such as MATLAB, C, or SystemC. This description is

Power-Aware Architectural Synthesis-0034

Power-Aware Architectural Synthesis-0035

then translated into a graph representation by a compiler front-end. Note that these first stages may be omitted if a graph-based specification is available. One such graph format, shown in Figure 17.8, is a task set composed of multiple directed acyclic graphs in which nodes represent tasks and edges represent data dependencies. Timing constraints may be expressed as deadlines (DL) on nodes. Different tasks may be invoked periodically with different periods.

In addition to the required functionality, a database containing price, power consumption, execution time, and other characteristics of processing elements and communication resources is also provided. A portion of one such database is shown in Table 17.3.

Potential architectures consisting of processing element allocations, assignments of tasks to processing elements, and a schedule of all tasks and communication events are then optimized. Costs such as price, power, and execution time are then evaluated. The process repeats until acceptable solutions are produced. The resulting architectures are then completed by using behavioral synthesis to generate application- specific cores or FPGA configurations for the hardware-implemented tasks and using a compiler to generate executable code for the software-implemented tasks. Note that many existing system synthesis algorithms only solve subsets of the entire system synthesis problem.

Power-Aware Architectural Synthesis-0036

Low-Power Hardware–Software Cosynthesis Algorithms

Low-power cosynthesis algorithms form the basis for later work on low-power SoC synthesis. They build upon power-aware allocation, assignment, and scheduling optimization engines and further improve power consumption with point techniques such as multiple voltage levels and domain-specific scheduling algorithms. Dick and Jha [47] developed a synthesis algorithm for low-power distributed systems that simultaneously optimizes power consumption and price while honoring hard real-time deadlines. Dave et al. [48] developed a constructive algorithm to solve the low-power multirate distributed system cosynthesis problem. Shang and Jha [49] presented a method of synthesizing low-power systems containing dynamically reconfigurable FPGAs.

Much of the early work in low-power hardware–software cosynthesis was based on the assumption that processing elements are off-the-shelf parts with strict constraints on operating voltages. Later work relaxed this assumption, considering multiple operating voltages and DVFS (described in Section 17.2). Gruian and Kuchcinski [50] developed a dual-voltage task-scheduling algorithm for reducing power consumption. Kirovski and Potkonjak [51] developed an integrated DVFS and system synthesis algorithm for independent tasks mapped to a bus-based multiprocessor. Schmitz and Al-Hashimi [52] developed a genetic algorithm to incorporate DVFS into an energy minimization technique for distributed embedded systems. It takes the power variations of tasks into account while performing DVFS. An offline voltage-scaling heuristic is proposed that is fast enough for use in system synthesis, starting from real-time periodic task graphs. Yan et al. [53] proposed a scheduling algorithm that uses DVFS and adaptive body biasing to optimize both dynamic and leakage power consumption. Analytical solutions are derived to determine the optimal supply voltage and bias voltage. Then, the optimal energy consumption is determined under real-time constraints.

DVS can also be applied to communication links. Naturally, performing simultaneous DVS in the processors and communication links in a distributed system can yield greater power savings than per- forming DVS in the processor alone. Luo et al. [54] presented such a method. In addition to honoring real-time constraints, their scheduling algorithm also efficiently distributes timing slack among tasks and multihop communication events.

Quality of service (QoS) is an important consideration in designing systems for real-time multimedia and wireless communication applications. Qu and Potkonjak [55] proposed a technique for partitioning a set of applications among multiple processors and determining a DVFS schedule to minimize energy consumption under constraints on QoS. The applications are assumed to be independent, have the same arrival times and no deadline constraints.

Low-Power System-on-Chip Synthesis Algorithms

The low-power SoC problem combines elements of hardware–software cosynthesis and behavioral syn- thesis. Like hardware–software cosynthesis, tasks may be implemented with general-purpose instruction processors or application-specific hardware accelerators. However, the synthesis algorithm potentially has greater control over the details of hardware implementation, opening new options for power optimization.

Methods of estimating SoC power consumption are essential to enable design exploration and syn- thesis. Bergamaschi et al. [56] developed an SoC analysis tool that estimates power and may be used within a system synthesis flow. Lajolo et al. [57] described a number of ASIC and instruction processor power estimation techniques that may be used in system synthesis. Based on these power estimation algorithms, synthesis algorithms may select and optimize SoC designs.

Power estimation techniques can be used to guide the search for high-quality solutions during the synthesis of low-power or low-temperature SoCs. Givargis et al. [58] developed a method of pruning the set of SoC candidate architectures to efficiently arrive at low-power designs. They determine which elements of the solution are independent from each other, thereby decomposing the problem into small, independent problems. Fei and Jha [59] describe a functional partitioning method for synthesizing low- power real-time distributed embedded systems whose constituent nodes are SoCs. The input specification, given as a set of task graphs, is partitioned and each portion is implemented as an SoC. Hung et al. [6] give a method of using voltage islands and thermal analysis within SoC synthesis to minimize peak temperature. Hong et al. [60] presented an algorithm to select a processor core and instruction/data cache configuration to best enable DVFS.

Communication networks have a large impact on the power consumption, performance, and fea- sibility of SoC designs. As a result, a number of researchers have worked on low-power, communication- centric SoC synthesis. Dick and Jha [61] developed a low-power SoC synthesis algorithm that optimizes power consumption, performance, and area. It uses floorplanning block placement to estimate com- munication delay, power consumption, and wire congestion. Lyonnard et al. [62] developed a low- power SoC synthesis algorithm that gives great attention to communication network synthesis. Instead of estimating physical characteristics via floorplanning, this work focuses on logical bus structure and communication protocol modeling. Hu et al. [63] optimize SoC bus bit-width under a fixed processing element allocation, task assignment, and schedule. Results for a seven-core H.263 encoder are pre- sented. Thepayasuwan et al. [64] used simulated annealing to design bus topologies and demonstrated results for a JPEG SoC design. They did parasitic extraction for performance estimation and reduced power consumption by minimizing bus length. They proposed using the algorithm as a synthesis postprocessing step. Hu et al. [65] presented a method of using voltage islands in SoC designs that minimizes power consumption, area overhead, and number of voltage islands. Pasricha et al. [66] developed an algorithm for floorplan-aware synthesis of bus topologies that meet combinational delay constraints imposed by bus cycle times. This work assumes a fixed IP core allocation and task assign- ment. It minimizes bus count and bus width under explicit communication throughput constraints. Conventional SoC designs typically contain a limited number of modules connected by on-chip buses or point-to-point links. However, as the number of on-chip modules grows in the coming years, bus or point-to-point link communication will face serious problems due to increasing global wire delay. To address these issues, in SoC designs buses are gradually being replaced by more sophisticated on-chip communication networks [67]. An on-chip network may consume a significant portion of an SoC power budget [68]. Therefore, power and power-related design problems, such as thermal problems [69], are of great concern in network-on-chip designs. The design and synthesis of on-chip networks supporting multihop routing has grown into an active and broad research area. Readers may refer to Marculescu’s Chapter 16 in this handbook for a detailed treatment of this area.`

Comments

Popular posts from this blog

SRAM:Decoder and Word-Line Decoding Circuit [10–13].

ASIC and Custom IC Cell Information Representation:GDS2

Timing Description Languages:SDF