Content-Addressable Memory:A Low-Power Precomputation-Based CAM Design

A Low-Power Precomputation-Based CAM Design

In the conventional CAM design, the word match circuit adopts dynamic operation to improve search speed and hardware cost. However, the dynamic circuit has some design issues [17] such as clock skew, low noise margin, and charge sharing. To avoid these drawbacks in the dynamic CAM design, a static CAM design, called precomputation-based CAM (PB-CAM) [19] is one of the best approaches for high-speed and low-power BCAM applications. In the following section, the low-power PB-CAM design is described.

PB-CAM Architecture

The functional block diagram of the PB-CAM architecture, as shown in Figure 56.10, consists of the data memory with parameter field, the address decoder, the PB-CAM word match circuit, and the address priority encoder. Compared to the conventional CAM architecture shown in Figure 56.1, the PB-CAM architecture does not use CLK signal, which is required by dynamic circuit to perform data search operation, as a result of which the PB-CAM word match circuit adopts static pseudo-NMOS logic structure to realize data match operation. In addition, the PB-CAM architecture uses the parameter field to replace the valid bit field of conventional CAM design. The parameter field is utilized to perform precomputation skill and valid bit function simultaneously.

To address the low-power PB-CAM architecture, the design concept for this architecture is introduced. The memory organization of the PB-CAM architecture, as shown in Figure 56.11, is composed of the data memory, the parameter memory, and the parameter extractor. In the write operation, the parameter extractor extracts the parameter of the input data, and then stores the input data and its parameter into the data memory and the parameter memory, respectively. The functional definition of parameter in the PB-CAM architecture is that if data A is the same as data B, then the parameter of data A is the same as the parameter of data B. Based on the parameter definition, if the parameter of data A mismatches the parameter of data B, then

Content-Addressable Memory-0672

data A mismatches data B (by the P ÞQ Û ~Q Þ ~P theory). Using the parameter comparison approach, the major parts of mismatched data that reduce most of data comparison operations can be identified.

During search operation, in order to reduce large amounts of comparison operation, the operation of the PB-CAM is separated into two comparison processes. In the first comparison process, the parameter extractor extracts the parameter of a desired search data and the parameter comparison circuits then compare the parameter of the search data with all parameters stored in parameter memory in parallel. Recalling the parameter comparison as mentioned above, the data related to this stored parameter concurrently mismatch the search data, if the stored parameter mismatches the parameter of the search data. Otherwise, the data related to this stored parameter has yet to be identified (unidentified). Using the results of the first comparison process, the search data is only compared with those unidentified data to identify any match in the second comparison process. Based on the two comparison processes, if major parts of stored parameter mismatch the parameter of the search data, then most of the comparisons in the second comparison process are largely reduced. The function of the parameter comparison process is just like a filter, the major parts of mismatched data in the first comparison process are filtered to reduce most of the comparisons in the second comparison process. In the PB-CAM design, the parameter comparison process is also known as a precomputation process. Although the data search operation uses two comparison processes to identify any match, both the comparison processes are performed in parallel to improve the data searching speed.

Parameter Extraction Circuit

In the PB-CAM architecture, the parameter extractor dominates most parts of comparison power, since this circuit decides the number of unidentified data remaining after the parameter comparison process. In addition, both parameter memory and parameter comparison circuit of the PB-CAM architecture require extra hardware cost and power dissipation compared with conventional CAM architecture. Therefore, the design concept of the parameter extractor is to filter as many mismatched data as possible in the parameter comparison process with the probable shortest bit length of the parameter. Some functions can be used to realize the parameter extraction in the PB-CAM architecture, such as 1’s count function, parity function, and remainder function. In the PB-CAM architecture, the design adopts 1’s count function to perform the parameter extraction, because the 1’s count function not only filters large amounts of mismatched data with few bit length, but also reduces the transistor count of the PB-CAM cell to seven-transistor. With an n-bit data length, there are n + 1 kinds of 1’s count (from zero 1’s count to n 1’s count). Furthermore, it is necessary to add an extra kind of 1’s count to indicate the availability of stored data. Based on the parameter extraction function, the minimal bit length of parameter is êélog(n + 2)ùú. The required bit length of the parameter is shown in Figure 56.11. In the PB-CAM design, with m words by n bits CAM size, the average number of data comparison in the second comparison process is m/(n + 1), since there are n + 1 kinds of 1’s count, and only one kind of 1’s count (matches with the 1’s count of the search data) is unidentified in the parameter comparison process. For example, with a 128 words by 30 bits CAM size, if the search data is 0123456716, then the parameter of the search data is 12. Therefore, the stored data mismatches the search data when its parameter is not 12. Since the range of parameter value is from 0 to 30 and only one parameter value is unidentified in which the parameter value is 12, the average number of data comparison of the second comparison process is 128/31 » 4.

PB-CAM Word Circuit

According to the conventional CAM architecture, the circuit design of CAM word structure adopts dynamic CMOS circuit to improve overall system performance and hardware cost. However, there are some drawbacks to perform CAM word function with dynamic circuit. (1) The dynamic circuit needs an extra precharge time for each data searching operation. (2) The dynamic circuit has some problems such as charge sharing and noise problems. (3) A clock signal is necessary to handle the circuit operation.

(4) The noise margin of dynamic circuit is less than Vtn.

To eliminate these drawbacks in the conventional CAM word structure, a static pseudo-NMOS word structure shown in Figure 56.12 is one of best structures for realizing word match function. However, the main problem associated with the pseudo-NMOS circuit is its static power dissipation that occurs whenever the pull-down transistor M3 is turned on. Based on the static pseudo-NMOS word circuit design, the static power dissipation occurs when the stored data mismatch the search data, as a result of at least one pull-down transistor M3 being turned on. In general, with m words CAM size, there are (m - 1) stored data mismatched with the search data per data search operation. For this reason, the static power dissipation becomes one of the critical issues in the static pseudo-NMOS word match circuit.

To reduce the static power dissipation in the static pseudo-NMOS CAM word circuit, a novel static pseudo-NMOS CAM word circuit based on the PB-CAM architecture is shown in Figure 56.13. In the PB-CAM word circuit, the parameter comparison circuit is used to control the pull-up transistor M1. Recalling the design concept of the PB-CAM architecture, with m words by n bits CAM size, the average number of data comparison in the second comparison process equals to m/(n + 1). Using the proposed

Content-Addressable Memory-0673

precomputation skill, only m/(n + 1) static pseudo-NMOS PB-CAM word circuits turn on its pull-up transistor M1 by its parameter comparison circuit. In addition, one stored data among those m/(n + 1) static pseudo-NMOS PB-CAM word circuits matches the search data. Therefore, the number of PB-CAM word circuits that consume static power is reduced to (m/(n + 1)) – 1. For example, with 128 words by 30 bits CAM size, the average number of PB-CAM word circuits consuming static power is equal to (128/31) - 1 » 3, the static pseudo-NMOS PB-CAM word circuits reduce much of static power dissipation.

PB-CAM Cell

In the previous BCAM circuit design, the BCAM cell is typically constructed by nine-transistor structure as shown in Figure 56.2(a). There are some drawbacks in the conventional BCAM cell circuit.

(1) The BCAM cell requires nine transistors that consume large hardware cost. (2) The BCAM cell design uses the PTL-type XOR gate in the bit comparison circuit, the operating voltage of the BCAM cell circuit cannot be reduced efficiently. (3) The BCAM cell adopts XOR gate to perform bit comparison operation, the XOR gate demands more power consumption than the other standard gates such as NAND gate and NOR gate.

Unlike the conventional BCAM cell design, the PB-CAM cell is a seven-transistor cell structure as shown in Figure 56.14. This cell incorporates a standard five-transistor D-latch device to store the input bit and a NAND-type bit comparison circuit containing two transistors M2 and M3 to drive a word match-line ML. To achieve low-voltage operation, the feedback inverter (INV 2) is a weak-driving design allowing the input data (BL) to be stored in the D-latch device easily. In the conventional BCAM cell design as shown in Figure 56.2, if the search data BL mismatches the stored data Q, then the word match line is VSS, else the word match line is floating. However, in the PB-CAM cell design, the data comparison function is different from that of the conventional BCAM cell design. The truth table of both conventional BCAM cell and PB-CAM cell are shown in Table 56.1 and Table 56.4, respectively. According to both truth tables, the comparison result of the PB-CAM cell does not meet the requirement of the conventional BCAM cell design when the search data BL is 0 and the stored data Q is 1. In this situation, the search data BL mismatches the stored data Q; however, the data comparison result of the PB-CAM cell is matched (since the match-line ML is floating).

Although the PB-CAM cell has an input condition that gives rise to a different result compared with conventional BCAM cell in the search operation, it can be ignored based on the PB-CAM word structure as shown in Figure 56.15. The PB-CAM word circuit has three cases for the search operation. In the first case, the search data BL equals the stored data Q. Since BLi = Qi , for all i, the output of PB-CAM cell equals the output of conventional BCAM cell. In the second case, the parameter of the search data is not equal to the parameter of the stored data (V = 1). Since the pull-up transistor M1 is turned off and the pull-down transistor M2 is turned on by signal V, the match line ML is VSS disregarding the comparison results of PB-CAM cells. The last case is that the search data BL is not equal to the stored data Q, but the parameter of the search data equals the parameter of stored data (V = 0). As a result of BL ¹ Q and

Content-Addressable Memory-0674

Content-Addressable Memory-0675

V = 0, this condition exists at least two bit positions i, j, where n - 1 £ i, j £ 0, such that (BLi , Qi) = (1, 0) and (BLj , Qj) = (0, 1), respectively. Recalling the results of Table 56.4, the mismatched pattern (BLi , Qi) = (1, 0) is detected by the PB-CAM cell, and the comparison result is mismatched. For this reason, another mismatched pattern (BLi , Qi) = (0, 1) (the input condition that results in different comparison result between the PB-CAM cell and the conventional BCAM cell) is ignored. To summarize the PB-CAM word circuit with the seven-transistor PB-CAM cell, the data comparison result of the PB-CAM word circuit is the same as that of the conventional BCAM word circuit with the BCAM cell. In addition, the bit comparison circuit in the proposed PB-CAM cell adopts CMOS-type NAND gate to replace conventional PTL-type XOR gate. Therefore, the PB-CAM word circuit not only simplifies hardware design, but also reduces operating voltage and power dissipation.

Comments

Popular posts from this blog

SRAM:Decoder and Word-Line Decoding Circuit [10–13].

ASIC and Custom IC Cell Information Representation:GDS2

Timing Description Languages:SDF