Low-Power Memory Circuits:Read-Only Memory

Introduction

In recent years, a rapid development in VLSI fabrication has led to decreased device geometries and increased transistor densities of integrated circuits (ICs), and circuits with high complexities and very high frequencies have started to emerge. Such circuits consume an excessive amount of power and generate an increased amount of heat. Circuits with excessive power dissipation are more susceptible to run time failures and present serious reliability problems. Increased temperature from high-power processors tends to exacerbate several silicon failure mechanisms. Every 10°C increase in operating temperature approximately doubles a component’s failure rate. Increasingly expensive packaging and cooling strategies are required as a chip power increases [1,2]. Owing to these concerns, circuit designers are realizing the importance of limiting power consumption and improving energy efficiency at all levels of the design. The second driving force behind the low-power design phenomenon is a growing class of personal computing devices, such as portable desktops, digital pens, audio- and video-based multimedia products, and wireless communications and imaging systems, such as personal digital assistants, personal communicators, and smart cards. These devices and systems demand high-speed, high-throughput computations, complex functionalities and often real-time processing capabilities [3,4]. The performance of these devices is limited by the size, weight, and lifetime of batteries. Serious reliability problems, increased design costs and battery-operated applications prompted the IC design community to look more aggressively for new approaches and methodologies that produce more power-efficient designs, which means significant reductions in power consumption for the same level of performance. Memory circuits form an integral part of every system design as dynamic random access memories (RAMs), static RAMs, Ferroelectric RAMs, MRAMs, ROMs or ﬂash memories, signiﬁcantly contributing to the system-level power consumption. Two examples of recently presented reduced-power processors show that 43 and 50.3%, respectively, of the total system power consumption is attributed to memory circuits [5,6]. Therefore, reducing the power dissipation in memories can signiﬁcantly improve the system power-efﬁciency, performance, reliability, and overall costs.

In this section, all sources of power consumption in different types of memories will be identiﬁed, several low-power techniques will be presented and the latest developments in low-power memories will be analyzed.

Read-Only Memory

Read-only memories (ROMs) are widely used in a variety of applications (permanent code storage for microprocessors or data look-up tables in multimedia processors) for a ﬁxed long-term data storage. The high area density and new submicron technologies with multiple metal layers increase the popularity of ROMs for a low-voltage low-power environment. In the following section, sources of power dissipation in ROMs and applicable efﬁcient low-power techniques are examined.

Sources of Power Dissipation

A basic block diagram of an ROM architecture is presented in Figure 57.1 [7,8]. It consists of an address decoder, a memory controller, a column multiplexer/driver, and a cell array. Table 57.1 shows an example of a power dissipation in a 2K X 18 ROM designed in 0.6 µm CMOS technology at 3.3 V and clocked at 10 MHz [8]. The cell array dissipates 89% of the total ROM power and 11% is dissipated in the decoder, control logic, and the drivers. The majority of the power consumed in the cell array is due to the precharging of large capacitive BLs. During the read and write cycles more than 18 BLs are switched per access, because the wordline selects more BLs than is necessary. Example in Figure 57.2 shows a

12-1 multiplexer and a BL with ﬁve transistors connected to it. This topology consumes excessive amounts of power, because four more BLs will switch instead of just one. The power dissipated in the decoder, control logic, and in drivers is due to the switching activity during the read and precharge cycles and generating control signals for the entire memory.

Low-Power ROMs

To signiﬁcantly reduce the power consumption in ROMs, every part of the architecture has to be targeted and multiple techniques have to be applied. Several architectural improvements in the cell array that minimize energy waste and improve efﬁciency have been identiﬁed [8,13–16]. These techniques are:

• smaller cell arrays

• hierarchical wordline

• selective precharging

• minimization of nonzero terms

• inverted ROM core(s)

• row(s) inversion

• sign magnitude encoding

• sign magnitude and inverted block

• difference encoding

• three-dimensional decoding

• cascode sensing

• charge recycling and charge sharing

All of these methods result in a reduction of the capacitance and/or switching activity of bit- and rowlines. In applications where different bit sizes of data are needed, smaller memory arrays are useful to implement. If stored in a single memory array, its bit size is determined by the largest number. However, most of the bit positions in smaller numbers are occupied by nonzero values that would increase the bit and rowline capacitance. Therefore, by grouping the data to smaller memory arrays according to their size, signiﬁcant savings in power can be achieved. A hierarchical wordline approach divides memory in separate blocks and run the block wordline in one layer and a global wordline in another layer. As a result, only the bit cells of the desired block are accessed. A selective precharging method addresses the problem of activating multiple BLs eventhough only a single memory location is being accessed. Using this method only those BLs are precharged which are being accessed. The hardware overhead for implementing this function is minimum. A minimization of nonzero terms reduces the total capacitance of bit- and rowlines, because zero terms do not switch BLs. This reduces also the number of transistors in the memory core. An inverted ROM applies to a memory with a large number of ones. In this case, the entire ROM array could be inverted and the ﬁnal data will be inverted back in the output driver circuitry. Consequently, the number of transistors and the capacitance of bit and row lines is reduced. An inverted row method also minimizes nonzero terms but on a row by row basis. This type of encoding requires an extra bit (MSB), which indicates whether or not a particular row is encoded. A sign and magnitude encoding is used to store negative numbers. This method also minimizes the number of the ones in the memory. However, a two-complement conversion is required when data are retrieved from the memory. A sign and magnitude and an inverted block is a combination of the two techniques described previously. A difference encoding can be used to reduce the size of the cell array. In applications where a ROM is accessed sequentially and the data read from one address do not change signiﬁcantly from the following address, the memory core can store the difference between these two entries instead of the entire value. The disadvantage is a need for an additional adder circuit to calculate the original value. Three-dimensional decoding signiﬁcantly reduces a number of decoding stages which results in shorter delay as well as lower power consumption due to reduced area [13]. The block diagram of the circuit is shown in Figure 57.3.

The concept is as follows. The address lines are divided into three parts: lines A6, A5, A4 select the row number of the data core; lines A3, A2, and A1 determine which shared column to activate and lead to the upper decoder; line A0 is fed to the lower decoder. The upper and lower pass blocks are used to resolve nonnatural encoding problems. As a result, not only are the encoded ROM data in a natural order, but the design also reduces the number of transistors necessary for implementation. Compared with a traditional 2-D decoder structure, the 3-D concept improves the performance by almost 70% while maintaining power consumption level.

To support low-voltage operation, a cascode sensing scheme can be implemented [14]. The speed of a read operation is improved by using a dummy sense ampliﬁer to control the BL precharging period despite the high dependence of programmed data on BL capacitance. The diagram of the sense- amplifying circuit is shown in Figure 57.4. The circuit consists of a read cascode sense ampliﬁer (RSA), a dummy sense ampliﬁer (DSA), a bitline (BL) booster, and a pull-down circuitry. Since BL capacitance increases with the number of zero-programmed cells per BL, it is difﬁcult to precharge the BL to the optimum voltage level for sensing. Therefore, the BL booster is used to charge BL rapidly. The DSA monitors the BL level and stops charging through the BL booster as soon as the BL reaches the optimum level. This technique allows operation down to Vdd = 0.8 V in 0.25 µm CMOS technology for a fraction of area penalty.

In power reduction for ROMs, very efﬁcient techniques are charge sharing and charge recycling [15,16]; speciﬁcally, charge recycling predecoder (CRPD), charge recycling wordline decoder (CRWD), charge recycling bitline (CRBL), and charge sharing bitline (CSBL). The concept of CRPD is shown in Figure 57.5. In the CRPD, the newly selected predecoder line is charged only to Vdd /2 (Figure 57.5[b]) as opposed to a full Vdd (Figure 57.5[a]) due to the charge sharing with the previously selected predecoder line. Figure 57.5(c)

shows an example of a 2–4 CRPD. Compared with a conventional decoder, the CRPD line needs a charge sharing driver which consists of a D-FF, six gates, and a buffer. The D-FF stores the previous status of the predecoder line and the XOR gate detects whether the status of the line changes or not.

Figure 57.6 shows a charge recycling wordline circuit. Conceptually, it is very similar to the CRPD. The voltage swing on the wordline changes from ground to Vdd. However, it is a two-phase operation where during the ﬁrst phase, half of the swing is recycled from the previously asserted wordline. The large capacitor Clarge is used to recycle this charge. Usually, the large capacitor is designed to be about 10 times larger than the wordline capacitance Cwordline. Then it will take approximately 10 clock cycles for the Clarge to reach Vdd/2 level. The circuit consists of a charge sharing driver and a wordline decoder selected by row address. Figure 57.7 shows the principle of this technique. Numbers 1–5 indicate the sequence of steps during the operation. The total power saved by this technique is 45%.

By using three capacitors for each group, Ccolumn, CS0, and CS1, charge sharing BL (CSBL) reduces the BL voltage swing as seen in Figure 57.8. Ccolumn represents the total drain capacitance of all the column select transistors and the wiring capacitance. Capacitors CS0 and CS1 are used to generate and store a reference voltage VREF for the sense ampliﬁer. They must be of the same capacitance to increase the noise margin. The minimum size of capacitors is selected to be within the range where the reference voltage can overcome voltage variations due to internal and external noise and layout mismatches.

The technique works as follows. Ccolumn and CS0 are precharged. Ccolumn holds Vdd whereas CS0 holds Vdd-Vt due to the threshold voltage degradation. CS1 and CBL are discharged to ground. Next, a column is selected and the Ccolumn, CS0 and CBL will share their charge until it settles at

Afterwards, a wordline is selected. If the ROM data is “1,” then the BL remains at VCS; if it is “0”, then the BL is discharged. VSS (GND) to VCS represents a small voltage swing. Therefore, a sense ampliﬁer is needed to correctly read the data. Using CRPD, CRWD, and CSBL, the designer can expect to save on average 18, 28, and 36% of power, respectively.

Lastly, on the system level, CRBL technique can be implemented. BL swing voltage is reduced by charge recycling between BLs. The concept is illustrated in Figures 57.9 and 57.10. When N BLs recycle their charges, the voltage swing and the BL power decrease to 1/N and 1/N2, respectively.

BLs are grouped into pairs, a BL, and a complementary BL. The BLs within each pair are connected through a transistor switch controlled by Equal signal. During the equalization mode, the signal is asserted, all programmed connections between BL pairs are disconnected and the two BLs in each BL pair are connected. During this time, the connected BLs share their charges and the BL voltages equalize. When the ROM is in the evaluation phase, the BLs in each pair are disconnected and all BLs are connected to BLs in the neighboring BL pair by a transistor switch controlled by Eval signal. Figure 57.9 shows N BL pairs. The top BL of the top BL pair is charged to Vdd, whereas the complementary BL in the same pair is at (N-1)/N Vdd. Subsequently, the charge is recycled to the N-1 BL pair. The bottom BL of the bottom BL pair is at GND level. Figure 57.11 demostrates the CRBL in ROM architecture. The power consumed by the BLs and the sense ampliﬁers is reduced by 72%.

On a circuit level, powerful techniques minimizing the power dissipation could be applied. The most common technique is reducing the power supply voltage to approximately Vdd ~ 2Vt in a correlation with the architectural-based scaling. In this region of operation, the CMOS circuits achieve the maximum

power efﬁciency [9,10]. This results in large power savings because the power supply is a quadratic term in a well-known dynamic power equation. In addition, the static power and short-circuit power are also reduced. It is important that all the transistors in the decoder, control logic, and driver block be sized properly for low-power, low-voltage operation. Rabaey and Pedram [9] have shown that the ideal low- power sizing is when Cd = CL /2, where Cd is a total parasitic capacitance from driving transistors and CL

is total load capacitance of a particular circuit node. By applying this method to every circuit node, a maximum power efﬁciency could be achieved.

Next, different logic styles should be explored for the implementation of the decoder, control logic, and the drivers. Some alternative logic styles are superior to standard CMOS for low-power, low-voltage operation [11,12]. Last, by reducing the voltage swing of the BLs, signiﬁcant reduction in switching power could be obtained. One way of implementing this technique is to use NMOS precharge transistors. The BLs are then precharged to VddVt. Fifth method could be applied in cases when same location is accessed repeatedly [8]. In this case, a circuit called a voltage keeper can be used to store past history and avoid transitions in the data bus and adder (if sign and magnitude is implemented). Sixth method is in limiting short-circuit dissipation during address decoding and in the control logic and drivers. This can be achieved by careful design of individual logic circuits.

Search This Blog

Integrated circuit course