System Timing:Synchronous VLSI Systems

Introduction

System Timing-0531

approximately every 18 months, and this doubling in size has been accompanied by a similar exponential increase in circuit speed (or more precisely, clock frequency). These trends of steadily increasing circuit size and clock frequency are illustrated in Figure 50.1(a) and Figure 50.1(b), respectively. As a result of this revolution in semiconductor technology, it is not unusual for modern integrated circuits to contain hundreds of millions of switching elements (i.e., transistors) packed into a chip area as large as 500 mm2 [3–6]. Such technological capability is due to advances in both design methodologies and physical manufacturing technologies. Research and experience demonstrate that this trend of exponentially increasing integrated circuit-based computational power will continue into the foresee- able future.

Integrated circuit performance is typically characterized [7] by the speed of operation, the available circuit functionality, and the power consumption, and there are multiple factors which directly affect these performance characteristics. While each of these factors is significant, on the technological side, increased circuit performance has been largely achieved by the following approaches:

• Reduction in feature size (technology scaling), that is, the capability of manufacturing physically smaller and faster device structures

• Increase in chip area, permitting a larger number of circuits and therefore greater on-chip functionality

• Advances in packaging technology, permitting the increasing volume of data traffic between an integrated circuit and its environment as well as the efficient removal of heat generated during circuit operation

The most complex integrated circuits are referred to as very large scale integration (VLSI) circuits. This term describes the complexity of modern integrated circuits consisting of hundreds of thousands to many millions of active transistor elements. Presently, the leading integrated circuit manufacturers have a technological capability for the mass production of VLSI circuits with feature sizes as small as 65 nm [8]. These technologies are identified with the terms nanometer or very deep submicrometer (VDSM) technologies.

As these dramatic advances in fabrication technologies take place, integrated circuit performance is often limited by effects closely related to the very reasons behind these advances such as small geometry interconnect structures. Circuit performance becomes strongly dependent and limited by electrical issues that are particularly significant in deep submicrometer integrated circuits. Signal delay and related waveform effects are among those phenomena that have great impact on high performance integrated circuit design methodologies and the resulting system implementation. In the case of fully synchronous VLSI systems, these effects have the potential to create catastrophic failures due to the limited time available for signal propagation between logic gates.

Specifically, in Section 50.2, general timing and operational properties of synchronous circuits are presented. In Section 50.3, the modeling of synchronous circuit components (suitable for computer manipulation) is presented. Also in Section 50.3, the impact of the clock distribution network on circuit timing is described. In Section 50.4, system timing is analyzed. First, system timing properties of edge- triggered and level-sensitive circuits are analyzed. Then, clock skew scheduling methodologies for both types of circuit structures are described. Last, the limitations to improvements in circuit performance achievable through clock skew scheduling are presented. The section is finalized with an appendix containing a glossary of the many terms used throughout this chapter.

Synchronous VLSI Systems

Owing to the relative simplicity in the design process, the analysis and optimization of VLSI circuits are generally based on logic components operating under a fully synchronous synchronization scheme. In the following sections, these design concepts are briefly reviewed and related fundamental properties are identified. The operational components of VLSI systems relevant to system timing are highlighted.

General Overview

Typically, a digital VLSI system performs a complex computational algorithm, such as a fast Fourier transform or a RISC † architecture microprocessor. Although modern VLSI systems contain a large number of components, these systems normally employ only a limited number of different kinds of logic elements or logic gates. Each logic element accepts certain input signals and computes an output signal for use by other logic elements. At the logic level of abstraction, a VLSI system is a network of hundreds of thousands or more logic gates whose terminals are interconnected by wires to implement a target algorithm.

The switching variables acting as inputs and outputs of a logic gate in a VLSI system are represented by tangible physical quantities,‡ while a number of these devices are interconnected to yield the desired function of each logic gate. The specific physical characteristics are collectively summarized with the term technology, encompassing details such as the type and behavior of the devices that can be built, the number and sequence of manufacturing steps, and the impedance of the different interconnect materials. Today, several technologies make possible the implementation of high-performance VLSI systems—these technologies are best exemplified by CMOS, bipolar, BiCMOS, and gallium arsenide [9,10]. CMOS technology, in particular, exhibits many desirable performance characteristics, such as low power consumption, high density, ease of design, and moderate to high speed. Owing to these excellent performance characteristics, CMOS technology has become the dominant VLSI technology used today.

The design of a digital VLSI system requires a great deal of effort to consider a broad range of architectural and logical issues; that is, choosing the appropriate gates and interconnections among these gates to achieve the required circuit function. No design is complete, however, without considering the dynamic (or transient) characteristics of the signal propagation, or, alternatively, the changing behavior of signals with time. Every computation performed by a switching circuit involves multiple signal transitions between logic states and requires a finite amount of time to complete. The voltage at every circuit node must reach a specific value for the computation to be completed. State-of-the-art integrated circuit design is therefore largely centered around the difficult task of predicting and properly interpreting signal waveform shapes at various points in a circuit.

System Timing-0532

In a typical VLSI system, millions of signal transitions determine the individual gate delays and the overall speed of the system. Some of these signal transitions can be executed concurrently while others must be executed in a strict sequential order [11]. The sequential occurrence of the latter operations—or signal transition events—must be properly coordinated in time such that logically correct system operation is guaranteed and the results are reliable (in the sense that these results can be repeated). This coordination is known as synchronization and is critical to ensuring that any pair of logical operations in a circuit with a precedence relationship proceed in the proper order. In modern digital integrated circuits, synchronization is achieved at all stages of the system design process and operation by a variety of techniques, known as a timing discipline or timing scheme [9,12–14]. With some exceptions, these circuits are based on a fully synchronous timing scheme, specifically developed to cope with the finite speed required by the physical signals to propagate through the system.

An example of a fully synchronous system is shown in Figure 50.2(a). As illustrated in Figure 50.2(a), there are three recognizable components in this system. The first component—the logic gates, collectively referred to as the combinational logic—provides the range of operations that a system executes. The second component—the clocked storage elements or simply the registers—are elements that store the results of the logical operations. Together, the combinational logic and registers constitute the computational portion of the synchronous system and are interconnected in a way that implements the required system function. The third component of the synchronous system—known as the clock distribution network—is a highly specialized circuit structure which does not perform a computational process but rather provides an important control capability. The clock generation and distribution network controls the overall synchronization of the circuit by generating a time reference and properly distributes this time reference to every register.

The normal operation of a synchronous system, such as the example finite-state machine shown in Figure 50.2(a), consists of the iterative execution of computations in the combinational logic followed by the storage of the processed results in the registers. The actual process of storage is temporally controlled by the clock signal and occurs once the signal transients in the logic gate outputs are completed and the outputs have settled to a valid state. At the beginning of each computational cycle, the inputs of the system together with the data stored in the registers initiate a new switching process. As time proceeds, the signals propagate through the logic, generating results at the logic output. By the end of the clock period, these results are stored in the registers. During the following clock cycle, the stored data values start propagating through the logic, progressing toward the system outputs.

The operation of a digital system can therefore be thought of as the sequential execution of a large set of simple computations that occur concurrently in the combinational logic portion of the system. The concept of a local data path is a useful abstraction for each of these simple operations and is shown in Figure 50.2(b). The magnitude of the delay of the combinational logic is bound by the requirement of storing data in a register within a clock period. The initial register Ri is the storage element at the beginning of the local data path and provides some or all of the input signals for the combinational logic at the beginning of the computational cycle (defined by the beginning of the clock period). The combinational path ends with the data successfully latching within the final register Rf where the results are stored at the end of the computational cycle. Registers act as sources and sinks for the data between the clock cycles.

Advantages and Drawbacks of Synchronous Systems

The behavior of a fully synchronous system is well defined and controllable as long as the time window provided by the clock period is sufficiently long to allow every signal in the circuit to propagate through the required logic gates and interconnect wires and successfully latch within the final register. In designing the system and choosing the proper clock period, however, two contradictory requirements must be satisfied. First, the smaller the clock period, the more computational cycles can be performed by the circuit in a given amount of time. Alternatively, the time window defined by the clock period must be sufficiently long such that the slowest signals reach the destination registers before the current clock cycle is concluded and the following clock cycle is initiated.

Such an organization of computation has certain clear advantages that propel a fully synchronous timing scheme to remain as the primary choice for digital VLSI systems:

• The properties and variations are simple and well understood.

• The scheme eliminates the nondeterministic behavior of the propagation delay in the combinational logic (due to environmental and process fluctuations and unknown input signal patterns) such that the system exhibits a deterministic behavior corresponding to the implemented algorithm.

• The circuit design does not need to be concerned with glitches in the combinational logic outputs, so the only relevant dynamic characteristic of the logic is the propagation delay.

• The state of the system is completely defined within the storage elements—this fact greatly simplifies certain aspects of the design, debug, and test phases in developing a large system.

A synchronous paradigm, however, also has certain limitations that make the design of synchronous VLSI systems increasingly challenging:

• This synchronous approach has a serious drawback in that the timing scheme requires the overall circuit to operate as slow as the slowest register-to-register path. Thus, the global speed of a fully synchronous system depends upon those paths in the combinational logic with the largest delays—these paths are also known as the worst-case or critical paths. In a typical VLSI system, the propagation delays in the combinational paths are distributed unevenly so there may be many paths with delays much smaller than the clock period. Although these paths could operate correctly at a lower clock period—higher clock frequency—it is those paths with the largest delays that bound the clock period, thereby imposing a limit on the overall system speed. This imbalance in propagation delays is sometimes so dramatic that the system speed is dictated by only a handful of very slow paths.

• The clock signal has to be distributed to tens of thousands of storage registers scattered throughout the system. A significant portion of the system area and dissipated power is therefore devoted to the clock distribution network (reviewed in Section 50.3)—a circuit structure that does not perform any computational function.

• The reliable operation of the system depends upon the assumptions concerning the value of the propagation delays, which, if not satisfied, can lead to catastrophic timing violations and render the system unusable.

Comments

Popular posts from this blog

Square wave oscillators and Op-amp square wave oscillator.

Adders:Carry Look-Ahead Adder.