Logic Synthesis for Field Programmable Gate Array (FPGA) Technology:Two-Step Approaches

Two-Step Approaches

As with Chortle, the two-step methods start with an optimized network in which the number of literals is minimized. The network is decomposed to be feasible in the first step; then the number of nodes is reduced in the second step. If the given network is already feasible, the first step is skipped.

First Step: Decomposition

For a given FPGA device, with a k-input TLU, all nodes of the network with more than k inputs must be decomposed. Different methods decompose the network in different ways.

MIS-pga 1

MIS-pga 1 was developed at Berkeley for FPGA synthesis, as an extension of MIS-II. It uses two algo- rithms, kernel decomposition and Roth-Karp decomposition, to decompose the infeasible nodes separately; then it selects the better result.

Kernel decomposition decomposes an infeasible node ni by extracting a kernel function, ki, and splitting ni based on ki and its residue, ri . The residue ri, of a kernel ki, of a function F, is the expression for F with a new variable substituted for all occurrences of ki in F; for example, if F = x1x2 + x1x3, then ki = x2 + x3, and ri = x1ki. As there may be more than one kernel function that exists for a node, a cost function is associated with each kernel: cost(ki) = |sup(ki) Ç sup(ri)|. The kernel with minimum cost is chosen.

A kernel decomposition is illustrated in Figure 67.16.

Splitting infeasible nodes by kernel functions minimizes the number of new edges generated. Therefore, the considerations of wiring resources and logic area are integrated together. This procedure is applied recursively until all nodes are feasible. If no kernels can be extracted for a node, an AND–OR decomposition is applied.

Roth-Karp decomposition is based on the classical decomposition of Ashenhurst and Curtis [13]. Instead of building a decomposition chart whose size grows exponentially, as it does with the original method, a compact cover representation of the on-set and the off-set of the function is used. The Roth- Karp algorithm avoids the expensive computation of the best solution by accepting the first bound set. As with kernel decomposition, the AND/OR decomposition is used as a last resort.

Hydra Decomposition

The Hydra algorithm, developed at Stanford University, is designed specifically for two-output TLU FPGAs [14]. Decomposition in Hydra is performed in three stages. The first and third stages are AND–OR decompositions, while the second stage is a simple-disjoint decomposition, which is defined as the following:

Given a function, F, and its support, S, with F = G(H(Sa), Sb), where Sa, Sb S and Sa Sb = S; If Sa Sb = 0, then G is a disjoint decomposition of F.

The first stage is executed only if the number of inputs to the nodes in the given network is larger than a given threshold. Without performing the first stage, the efficiency of the second stage would be reduced. The last stage is applied only if the resulting network is still infeasible.

In the second stage, the algorithm searches for all the function pairs that have common variables. It then applies the simple-disjoint decomposition on those function pairs. As a result, two CLBs with the same fan-ins can be merged into one two-output CLB. The rationale is illustrated in Figure 67.17.

A weighted graph G(V, E, W) that represents the shared-variable relationship is constructed based on the given Boolean network. In G(V, E, W), V is the node set corresponding to that of the Boolean network;

edge, eij E, exists for any pair of nodes, {vi, vj} ⊂ V, if they share variables; and weight, wij W, is the number of variables shared correspondingly. Edges are first sorted by weight and then traversed in

Logic Synthesis for Field Programmable Gate Array (FPGA) Technology-0298

decreasing order to check for simple-disjoint decomposition. A cost function, which is the linear com- bination of the number of the shared inputs and the total number of variables in the extracted functions, is computed to decide whether or not to accept a certain simple decomposition.

Xmap Decomposition

The Xmap decomposes the infeasible network by converting the SOP form from MIS-II into an if-then-else DAG representation [15]. The terms of the SOP network are collected in a set, T; then, variables are sorted in decreasing order of the frequency of their appearance in T; finally, the if-then-else DAG is formed by the following recursive function:

• Let V be the most frequently used variable in the current set, T.

• Sort the terms in T into subsets T(Vd), T(V1), according to VT(Vd) is the subset in which V does not appear, T(V1) is the onset of V, and T(V0) is the offset of V.

• Delete V from all terms in T; then apply the same procedure recursively to the three subsets until all variables are tested.

The resulting if-then-else DAG after the first iteration is given in Figure 67.18. A circuit that has been mapped to an if-then-else DAG is immediately suited for use with multiplexer-based CLBs [16]. Additional steps are used to optimize the DAG for use with TLU functions.

Second Step: Node Elimination

Three approaches have been proposed for node elimination: local elimination, covering, and merging.

Local Elimination

The operation used for local elimination is collapsing, which merges node ni into node nj whenever ni is a fan-in node to nj and the new node obtained is feasible. The Hydra algorithm accepts local eliminations as soon as they are found. MIS-pga 1, however, first orders all possible local eliminations as a function of the increase in the number of interconnections resulting from each elimination, and then greedily selects the best local eliminations.

The number of nodes can be reduced by local elimination, but its myopic view of the network causes local elimination to miss better solutions. Additionally, the new node created by merging multifan-out nodes may substantially increase the number of connections among TLUs and hence make the wiring problem more difficult. This problem is more severe in Hydra than in MIS-pga 1.

Covering

The covering operation takes a global view of the network by identifying clusters of nodes that could be combined into a single TLU. The operation is a procedure of finding and selecting supernodes.

Logic Synthesis for Field Programmable Gate Array (FPGA) Technology-0299

A supernode, Si, of a node ni, is a cluster of nodes consisting of ni and some other nodes in the transitive fan-in of ni such that the maximum number of inputs to Si is k. Obviously, more than one supernode may exist for a node.

In MIS-pga 1, the covering operation is performed in two stages. In the first stage, the supernodes are found by repeatedly applying the maxflow algorithm at each node. In the second stage, an optimal subset of the supernodes that can cover the whole network using a minimum number of supernodes is selected by solving a binate covering problem whose constrains are: first, all intermediate nodes should be included in at least one supernode; second, if a supernode Si is selected, some supernodes that supply the inputs of Si must be selected (the ordinary [unate], covering problem just has the first constraint).

Hydra examines the nodes of the network in order of decreasing number of inputs. An unassigned node with the maximal number of inputs is chosen first. A second node is then chosen such that the two nodes can be merged into the same TLU and the cost function (same cost function as was used in decomposition step) is maximized. This greedy process stops when all unexamined nodes have been considered.

For Xmap, the logic blocks to be found are sub-DAGs of the if-then-else DAG for the entire circuit. The algorithm traverses the if-then-else DAG from inputs to outputs and keeps a log of inputs in the paths (called signals set) that can be used to compute the function of the node under consideration. Nodes in the signals set could be a marked node or a clean node. A marked node isolates its inputs to the current node, while a clean node exposes all its fan-ins. An overflow node, a node whose signals set is larger than k (the number of inputs of the TLU), a marking procedure is executed to reduce the fan-in. Xmap first marks the high-fan-out descendants of the node, and then marks the children of the node in decreasing order of the size of their signals set. The more inputs Xmap can isolate from the node under consideration, the better. The marking process cuts the if-then-else into pieces, each of which can be mapped into one CLB.

Merging

The purpose of the merging step is to combine nodes that share some inputs to exploit some of the particular features of FPGA architecture. For example, each CLB in the Xilinx XC4000 device has two four-input TLUs and a third TLU combining them with the ninth input (Section 67.3). In the three approaches discussed above, a postprocessing step is performed to merge pairs of nodes after the covering operation. The problem is formulated as a maximum cardinality matching problem.

MIS-pga 2: A Framework for TLU-Logic Optimization

MIS-pga 2 is an improved version of MIS-pga 1. It combines the advantageous features of Chortle-crf, MIS-pga 1, Xmap, and Hydra. In each step, Mis-pga 2 tries different algorithms and chooses the best [17].

Four decomposition algorithms are executed in the decomposition step:

1. Bin-packing: the algorithm is similar to that of Chortle-crf, except the heuristic of MIS-pga 2 is the Best-Fit Decreasing.

2. Cofactoring decomposition: it decomposes a node based on computing its Shannon cofactor (f = f1 f2 + ff3). The nodes in the resulting network have, at most, three inputs. This approach is particularly effective for functions in which cubes share many variables.

3. AND/OR decomposition: it can always find a feasible network, but is usually not a good network for the node-elimination step. Therefore, it is used as the last resort.

4. Disjoint decomposition: unlike Hydra, this method is used on a node-by-node basis. When it is used as a preprocessing stage for the bin-packing approach, a locally optimal decomposition can be found.

MIS-pga 2 interweaves some operations of the two-step methods. For example, the local elimination operation is applied to the original infeasible network as well as to the decomposed, feasible network. This same operation is referred to as partial collapse when applied before decomposition. Unlike MIS-pga 1, which separates the covering and the merging operations, these two operations are combined together to solve a single, binate covering problem.

Because MIS-pga 2 does a more exhaustive decomposition phase, and because the combined covering/ merging phase has a more global view of the circuit, MIS-pga 2 results are almost always superior to those of Chortle-crf, MIS-pga 2’s results are almost always superior to those of Chortle-crf, MIS-pga 1, Hydra, and Xmap. For the same reason, MIS-pga 2 is relatively slow, as compared to the other algorithms.

Comments

Popular posts from this blog

SRAM:Decoder and Word-Line Decoding Circuit [10–13].

ASIC and Custom IC Cell Information Representation:GDS2

Timing Description Languages:SDF