CAD DFT and Test rchitectures:Testing Systems with Networks on Chip

Testing Systems with Networks on Chip

As mentioned in the previous section, an SoC integrates several cores in a single chip for use in the communications field, multimedia, and general electronics. According to the International Technology Roadmap for Semiconductors, by the end of the decade, SoCs using 50 nm transistors and operating below 1 V, will grow to 4 billion transistors running at 10 GHz. A major challenge in such systems is to provide reliable means for component interaction, guarantee global synchronization of the chip, and tackle the growing wire delays that become the dominant factor of the overall delay.

The interconnects of traditional SoC architectures that were reviewed in the previous section are either dedicated channels or shared buses. Dedicated channels offer the best communication performance, but have poor reusability and are thus undesirable as the number of cores in the SoC grows. A shared bus is reusable, but supports one communication at a time (time division multiplexing). Its bandwidth is shared among all the cores in the system and its operating frequency decreases with the system’s growth. As SoCs grow in size, cross talk effects between wires, electromagnetic interferences, and even radiation-induced charge injection [21] render the internal communication between the cores unreliable.

A network on chip (NoC) solves such interconnect issues on SoC architectures. NoCs decouple the communication tasks from the computation tasks and offer well–defined protocols to eliminate contention in the channels. Ideas from computer networks have been borrowed to provide interconnections and communication among on-chip cores. A study presented in Ref. [22] demonstrates that NoC interconnects have better communication performance than the traditional bus interconnects for chips having as low as eight cores when the communication load is heavy. In case of a lighter communication load, the central bus architecture performs better than an NoC for chips of up to 16 cores. Clearly, as the number of cores per chip increases and the communication workload increases, NoCs become an appealing solution. In nanometer-scale technology, the performance of the communication in an NoC is more predictable than in bus–based SoCs because the geometry is regular, submicron effects are better contained, and, finally, there is separation between communication and computation tasks. NoCs can act in a globally asynchronous locally synchronous manner.

One important feature of an NoC is its reconfigurability and adaptability. Reconfigurability is ensured by the reuse of the communication network, and the reuse of the design, simulation, and prototype environment. The communication network can easily be reused and its channels can provide significantly better quality of service guarantees than an SoC bus. Reuse of design, simulation, and prototype environment makes it possible for many products to be based on the same NoC platform and reduces the expenses associated with the design and simulation of an NoC. Adding a new resource in a shared-bus system has a deep effect on the performance of the rest of the system. In contrast, the NoC is adaptable and neither its performance nor the system’s performance is degraded when adding new resources to the SoC.

Another important characteristic of systems with an NoC is that little or no additional hardware is needed to test the embedded cores, the on–chip micronetwork, and the network interface. Bus–based SoC architectures use TAMs such as in Refs. [13,23] to ensure accessibility and controllability of each core during testing.

In a system with an NoC the test patterns are applied via the existing networking infrastructure. To that respect, NoCs are an effective mechanism to reduce the overall DFT overhead of the SoC. All existing methods for testing the embedded cores of an NoC apply the test patterns in a bus-like manner [24–28] which requires a data transmission mode that is only recommended for jitter-free transmission. The test data for a core are submitted along the same route, and the test application is guided by scheduling formulations that resemble the case of bus–based SoCs [24,25].

This is a simple and effective approach. However, such a direction does not utilize significant features of the NoC infrastructure. The remaining of the section overviews existing on-chip micronetwork architectures and transmission–related characteristics that may be used to successfully eliminate contention in the channels to reduce the test application time and increase the quality of testing.

NoCs were introduced in Refs. [21,29]. The early work was on new interconnect architecture and data exchanges between the cores in the form of structured packets. In Ref. [30], a hybrid routing algorithm for on-chip communication that combines the advantages of both the deterministic and adaptive routing schemes is given. Results demonstrate that it significantly reduces the communication delay under heavy traffic.

Several NoC architectures have been proposed. AEthereal is Philips’ NoC architecture [31,32,41]. Routers may be connected with a flexible topology and intermodule communication is done with real packet switching, called the best effort (BE) mode, where data between different source–destination pairs share channels. A guaranteed throughput (GT) mode is also provided. It is used only when jitter–free transmission is required since it is known that BE transmission is more efficient [33].

The NoC is composed of the routers, the interconnects, and the network interfaces (NIs). Each NI acts as a bridge between an embedded core and an adjacent router. Data are transmitted using packets. Packets are composed of flits, the minimum transmission unit. Each flit can have up to three words of data. The header contains the amount of credit (used for flow control), the queue id (corresponds to the next router on the path), and the path to the destination (sequence of routers). Each flit in the packet

has an id field used to specify if the data is BE or GT, a size field which contains the number of data words in the flit and an end of packet flag.

Figure 68.7(a) shows that the building blocks of the AEthereal router are the controller (has separate control units for BE and GT routing), the header parsing units, the queues, and the switch. The header parsing unit receives input flits from the NI attached to the router or from another router in the NoC. It parses the flits, sends them to GT or BE queues, and notifies the controller of the arrivals. The controller is responsible for scheduling the flits. BE flits are scheduled in a round-robin fashion after scheduling any GT flits. The switch receives a signal from the controller and connects the appropriate queue to the appropriate port. For GT data, a circuit-switched path is reserved from the NI module of the source router. In the case of BE data, end-to-end credit-based flow control is implemented to avoid network overflow and congestion. Each outgoing port in the AEthereal architecture has separate queues for the GT and BE data.

The NI consists of the kernel and the shells (also see Figure 68.7[b]). The kernel receives the message from the core, packetizes it, schedules the packets to the routers, implements the end-to-end flow control and performs clock domain conversions between the core and the router. It has one message queue for outgoing messages and another one for incoming messages. The clock domain conversions are carried out at the queues. The routing path is configured at the NI and is added to the packet header. The shells implement narrowcast or multicast connections that involve the active NI port (starts the connection) and one or more (passive) NI ports. In narrowcast connection, the communication involves only one passive port at a time, while in multicast, the request is duplicated and sent to every passive NI port simultaneously. Shells also provide conversions to other protocols.

Another NoC architecture is the SoCBUS [34,35]. The interconnect is a two-dimensional mesh topology of switches. Each switch has five ports, four to connect to adjacent switches and the fifth one to connect to the local core through a wrapper. The wrapper has a functionality similar to the NI of AEthereal. In Ref. [36], a fat-tree interconnect architecture is proposed with the routers as the internal nodes and the cores as the leaves. This topology is cost effective for VLSI realization [37]. In Ref. [38], the intercore communication traffic is classified into four kinds of service: Signaling has the highest priority and is used for control signals and urgent messages. Real-time service is used for delay constrained data and could be achieved by enforcement circuits in the network. Read/Write (RD/WR) service is used for short memory and register data access. Block-transfer service is used for large data bursts. The service levels are implemented by means of a priority mechanism where signaling has the highest priority and block-transfer the lowest. Routers are connected in a two-dimensional mesh topology, each core is connected to a router via a standard interface, and each link’s bandwidth can be adjusted to its expected load. In Ref. [39], a honeycomb interconnect is proposed for the NoCs. The cores are the nodes of the hexagon and the switch is at the center of the hexagon and connects the cores.

Search This Blog

Integrated circuit course

CAD DFT and Test rchitectures:Testing Systems with Networks on Chip

Testing Systems with Networks on Chip

Comments

Post a Comment

Popular posts from this blog

Architecture and Design Flow Optimizations for Power-Aware FPGAs:Low-Power Circuit Techniques.

Adders:Carry Look-Ahead Adder.

SRAM:Decoder and Word-Line Decoding Circuit [10–13].