Low-Power Memory Circuits:Dynamic Random-Access Memory

Dynamic Random-Access Memory

Similar to all previous types of memories, DRAM has undergone a remarkable development toward higher access speed, higher density, and reduced power [51,79–82]. As for reducing power, a variety of techniques targeting various sources of power in DRAMs have been reported. In this section, sources of power consumption will be discussed and then several methods for the reduction of active and data retention power in DRAMs will be described.

Low-Power DRAM Circuits

Sources of DRAM Power

The total power dissipated in a DRAM has two components: the active power and the data retention power. Major contributors to the active power are decoders (row and column), memory array, sense ampliﬁer, other circuits—DC current dissipation (a refresh circuitry, a substrate back-bias generator, a boosted level generator, a voltage reference circuit, a half-Vdd generator and a voltage down converter), remaining periphery circuits (main sense ampliﬁer, I/O buffers, write circuitry, etc.) The total active power can be described as

where CD is the dataline capacitance, ∆VD the dataline voltage swing (0.5Vdd), m the number of cells connected to the activated dataline, CPT the capacitance of the periphery circuits, VINT the internal supply voltage and IDCP the static current.

The total data retention power is given as

where n is the number of words that require refresh and 1/tREF the frequency of the refresh operation (current).

Techniques for Low-Power Operation

To reduce power consumption during both modes of DRAM operation, many circuit techniques can be applied:

• Capacitance reduction, especially of datalines, wordlines, and shared I/O, using partial activation of multidivided datalines and partial activation of multidivided wordlines.

• Lowering of external and internal voltages.

• DC power reduction of peripheral circuits during the active mode by using static CMOS decoders, pulse techniques, and ATD circuit, similar to SRAMs.

• Refresh power reduction. In addition to capacitance reduction and operating voltages reduction which are applicable also to the refresh mode, decreasing the frequency of refresh cycle or decreas- ing the number of words n that require refresh affects the total refresh power.

• AC and DC power reduction of circuits such as a voltage down converter (VDC), a half-voltage generator (HVG), a boosted voltage generator (BVG) and a back-bias generator (BBG).

Capacitance Reduction

Charging and discharging large data and wordlines contribute to large amount of dissipated power in a DRAM [51,82]. Therefore, minimizing capacitance of these lines can accomplish signiﬁcant gains in power savings. There are two fundamental methods used to reduce capacitance in DRAMs: partial activation of multidivided dataline and partial activation of multidivided wordline. The concept of both techniques is shown in Figure 57.60 and Figure 57.61.

The foundation of partial activation of multidivided dataline (Figure 57.60) is in reducing the number of memory cells connected to an active dataline thus reducing its capacitance CD. The datalines are divided into small sections with shared I/O circuitry and a sense ampliﬁer. By sharing these resources, further reduction of CD is achieved. The partial activation is performed by activating only one sense ampliﬁer along the dataline. The principle of the partial activation of multidivided wordline (see Figure 57.61) is very similar to that of SRAMs.

A single wordline is divided into several ones by the subword-line (SWL) drivers. Every SWL has to be selected by the main wordline (MWL) and the row select line signal (RX). Thus, only a partial wordline will be activated.

Similar method, called a hierarchical decoding scheme with dynamic CMOS series logic predecoder, has been proposed for synchronous DRAMs (SDRAMs) [83,84]. This method targets the power losses in the peripheral region of the memory. This power is consumed due to the large capacitive loading of the datalines, address lines, and predecoder lines. The scheme is shown in Figure 57.62. The hierarchical decoder uses predecoded signal lines where the redundancy circuits are connected directly from the global lines. This results in a reduced capacitive loading and 50% reduction in the number of bus lines (column and row decoders). This circuit technique can be combined with a design of a small-swing single address driver with a dynamic predecoder [83,84]. This scheme allows a reduction of 23 address lines. The schematic diagram of this circuit is shown in Figure 57.63. Also, the scheme achieves a small swing in address lines with a short pulse driven pull-up transistor with a level holder of half-VINT power. The pull-up for the reduced swing bus line is achieved with a short pulse and its width brings the bus signal close to the small swing voltage (VINTL).

DC Current Reduction

During the active mode, most of the DC power in DRAMs and SDRAMs is consumed by the periphery circuits and I/O lines. The decoding and pulsed operation techniques based on an ATD circuit and similar

to those for SRAMs can be applied. To minimize power consumption of I/O lines in SDRAMs, two circuit techniques have been proposed [86]. As for the ﬁrst technique, the extended small swing read operation (∆VI/O= ±200 mV), the small-swing data paths (Local I/O and Global I/O) are extended up to the output buffer stages through Main I/O (MIO) lines (see Figure 57.63). Shared current sense ampliﬁers (I/O sense ampliﬁers) also reduce power consumption. In the second technique, the single I/O line driving write operation, halves the operating current of long global I/O lines and MIO lines. By combining these two methods, as much as 30% of total peripheral power can be saved.

Another power saving method for low-power SDRAMs is based on a new cell-operating concept [87]. When the operating voltage of the memory array is scaled to 1.8 V for 1-Gb SDRAMs, the performance signiﬁcantly degrades due to following factors. First, the sensing speed decreases due to the noticeable threshold voltage of source-ﬂoated transistors. Second, a triple-pumping circuit may be required to increase the power of boosted wordlines (relatively high Vpp). The concept of the proposed method is that the BLs are precharged to ground level (Vss). The wordline reset voltage is -0.5 V (as compared with 1/2Vdd in conventional schemes) so that a cell leakage current can be prevented while lowering the threshold voltage of pass transistors. This eliminates wordline boosting because the triple-boosting circuit is no longer required.

Operating Voltages Reduction

Lowering external and internal operating voltages is considered as an important technique for achieving signiﬁcant savings of power. In both active and standby modes, voltages from different sources, such as Vdd,VINT , or ∆VD, as described in Eq. (57.3) and Eq. (57.4), largely contribute to a total power consumption. Over the last decade, a trend in the reduction of the external power supply voltage Vdd for DRAMs has been observed, sliding from 12 down to 3.3, 2.5, and 1.2 V [84,85,87,94,97]. An experimental circuit with Vdd as low as 1 V has been recently reported [95]. The lack of a universal standard external operating power supply voltage has resulted in DRAMs with an on-chip VDCs that use widely accepted power supply voltages Vdd, such as 5 or lately 3.3 V, and lower the operating voltage for the memory core and thus gain power savings [50,51,91]. VDC is one of the most important DRAM circuits in achieving DRAM operation at battery voltage levels. In power-limited applications, VDC has to have a standby current <1 µA over a wide range of operating temperatures, process, and power supply voltage variations.

Also its output impedance has to be low. There are additional on-chip voltage generators: HVG for precharging BLs; BBG for subthreshold current and junction capacitance reduction, improving device isolation and latchup immunity, and circuit protection against voltage undershoots of input signals; and BVG for driving the wordlines [50,51].

HVG circuit has been used since 1-Mbyte DRAM generation. It is an efﬁcient technique to reduce the voltage swing on BLs from a full Vdd swing to 1/2Vdd swing. During the sensing, one BL switches from 1/2Vdd to Vdd and the second BL from 1/2Vdd to ground. As a result, the peak switching current is reduced and noise level is suppressed. Recently, a new technique that eliminates 1/2Vdd BL switching was proposed [88]. This new method, called “nonprecharged BL sensing” (NPBS), provides the three following features (as seen in Figure 57.64): (1) The precharge operation time is reduced by 78%, because the BLs are not substantially precharged; (2) the sensing speed increases because the BLs that have not been precharged remain at low or high level, increasing the VGS and VDS voltages for the sense ampliﬁer transistor; and (3) the power dissipation is reduced when the same data occur

on the BL. The power is reduced by ~ 43%. To maintain or improve the speed and reliability of DRAM operations, the threshold voltage Vt has to follow the same scaling pattern as the main power supply voltage. This scenario, however, results in a rapid increase of leakage currents in the entire memory during both active and standby modes. Therefore, an internal BBG circuit, also known as the charge pump, is needed to improve low-voltage, low-power operation by reducing the subthreshold currents. Figure 57.65 presents a pumping circuit that avoids the Vt losses [89]. When the clock (clk) is at logic low, the node voltage of the node A reaches “|Vtp| - Vdd.” The PMOS transistor “p1” clamps the voltage of the node B to the ground level. The Vbb voltage settles at “|Vtp| - Vdd - Vtn.” When clk changes to logic high, the node A changes to Vtp and the node B is capacitively coupled to -Vdd. As a result, Vbb voltage changes to -Vdd. This circuit requires triple-well technology to eliminate minority carrier injection of the “n1” transistor.

To limit the power consumption of this circuit during DRAM’s standby mode, the frequency of the clk signal can be reduced. This is possible to implement with BBG’s own ring oscillator controlled by BBG’s enable signal.

A BVG is used in DRAMs to generate a power supply signal higher than Vdd for driving the word- lines. This wordline voltage is higher than Vdd by at least the threshold voltage. The boosted level cannot be directly applied to drive the load. An isolation transistor is necessary to separate the switching boosted voltage from the load. One such arrangement is shown in Figure 57.66 [90]. This particular circuit generates an output of 2Vdd. Voltage scaling has no effect on its performance and therefore, it is suitable for Vdd reduction down to sub-1 V levels.

Leakage Current Reduction and Data-Retention Power

The key limitation in achieving a battery (1 V) or solar cell (0.5 V) operation will be the subthreshold power consumption that will dominate both active and standby DRAM modes. In this section, circuit techniques that drastically reduce leakage and data-retention power will be described.

Several methods that address the exponentially increasing threshold voltage in rapidly scaled technologies have been proposed. One such method, a well-driving scheme, uses a dynamic Vt by driving the well (see Figure 57.67) [82,92]. Thus the threshold voltage is higher during the standby mode than in the active mode. The advantage of this method is a fast operation in the active mode and a leakage current suppression in the standby mode.

To reduce the subthreshold currents in various DRAM voltage generators, a self-off-time detector circuit could be used [93]. It automatically evaluates the optimal off-time interval and controls the dynamic ON/OFF switching ratio of power-dissipation circuits such as level detectors. This method is directly applicable to any on-chip voltage generators and self-refresh circuits. The block diagram of this architecture is shown in Figure 57.68.

A charge-transfer presensing scheme (CTPS) with 1/2Vcc BL precharge and a nonreset row block control (NRBC) scheme reduces the data-retention current by 75% [94]. The principle of CTPS technique is shown in Figure 57.69. The SA and the BL are separated by the transfer-gate (TG). The BL is precharged to 1/2VccA (power supply voltage for the array) and the sense ampliﬁer node is precharged to a voltage higher than VccA. When TG is at a low level, the WL is activated and the data from the memory cell (MC) is transferred to the BL. A small voltage change appears on the BL pair. Then, TG voltage is set to the voltage for the CT condition and the charge of SA node is transferred to the BL. The transfer is complete when the BL voltage reaches “VTG-Vtn.” After that, a large variation of the readout voltage appears on the SA pair.

CTSP technique reduces the active array current and prolongs the data-retention time. The data- retention power can be reduced further by the NRBC scheme, which is used to reduce the charge/discharge number of row block control circuits to 1/128 of the conventional method. The NRBC architecture is shown in Figure 57.70. NRBC is a DWL structure where one SWL in the selected row block is activated if one MWL and one of four subdecode signals (SD0~3) are activated in this row block. Also the transfer- gates TG_L and TG_R are activated at both sides of this row block. After the data-retention mode is set,

SD and TG signals do not swing fully at every cycle but only every 128 cycles for activating the same row block. As a result, the row control current is reduced by 70% compared with the conventional scheme. Another effective method for leakage current reduction is “subthreshold leakage current suppression system” (SCSS) shown in Figure 57.71 [96]. The method features high drivability (Ids) and low-Vt transistors. The principle of this method is to reduce the active mode leakage current with a body bias control and to reduce the standby mode current by body bias and switched-source impedance. PMOS transistors use the boosted wordline voltage as a body bias, whereas NMOS transistors use memory cell substrate voltage as a body bias.

In addition to leakage suppression techniques, extending the refresh time can also signiﬁcantly reduce power consumption during the standby mode, as shown in Eq. (57.4) [85,98,99]. The refresh time is determined from the time needed for the stored charge in the memory cell to keep enough margin against leakage at high temperature. To achieve long refresh characteristics for a low-voltage operation, a negative wordline method can be applied [85]. Figure 57.72 shows the concept of this method. A negative gate-source voltage Vgs is applied which decreases MC transistor’s subthreshold

current and provides a noise-free dynamic refresh. It also enables the shallow back-bias voltage Vbb that reduces the electrical ﬁeld between the storage node and the p-well region under the memory cell and results in a small junction leakage current. This achieves longer static refresh time. Figure 57.73 shows an example of the negative voltage wordline driver.

Dual-period self-refresh (DPS-refresh) scheme is a method that can extend the refresh time by 4–6 times [98]. The principle of DPS-refresh scheme is shown in Figure 57.74 and the corresponding timing diagram in Figure 57.75. The key concept is to use two different internal self-refresh periods. All wordlines are separated into two groups according to retention test data that are stored in a PROM mode register implemented in the chip periphery. The short period t1 corresponds to a conventional self-refresh period determined by the minimum retention time in a chip. The long period t2 is set to the optimum refresh value. If all memory cells connected to a speciﬁc wordline have a retention time longer than t2, they are called long period wordline cells (LPWL) and are refreshed in the long period of t2. Otherwise, they are called short period wordline cells (SPWL) and the wordline is refreshed in the short period t1. DPS- refresh operation is then achieved by periodically skipping refresh cycles for LPWLs. The operation is composed of T1 periods repeated (n - 1) times followed by a T2. For a refresh cycle during T1 period, the inhibit _k, where “k” is from 0 to 3, goes low if the wordline selected in the array block “k” is an LPWL and disables all AND-gated MSi signals. As a result, the refresh operation is not executed. However,

during the T2 period, inhibit _k signals are driven high by T 2 clock signal. This signal is generated by the most signiﬁcant bit refresh address A11 divided by “p” period using the programmable divide-by-p counter. The period of A11 is equal to the short refresh period t1. Consequently, LPWLs are refreshed every “p × t1” periods. The advantage of DPS-refresh operation is that wordlines which have the same refresh address but are located in different array blocks are individually controlled by inhibit _k signals, which aids in prolonging the refresh time. Using this method, one half of the self-refresh current is saved compared with the conventional self-refresh technique.

Search This Blog

Integrated circuit course