Low Power Microarchitecture Techniques:Power Consumption

By Ahmed Farahat - September 14, 2015

Introduction

Very few domains have grown as fast as computer architecture. In the last 25 years, integrated microprocessors have evolved from small oddities to become the virtually single source of computing power. They now form the core of almost every electronic device, ranging from small, battery-operated devices to massively parallel computer systems. Their size, performance, and price make them ideal for an extremely wide range of applications.

Two decades ago, microprocessors were doing exactly what the Instruction Set Architecture (ISA) specified: read one instruction, decode and execute it, and then repeat the same steps for the next one. Each instruction required several clock cycles, and the execution time for the entire program was equal to the sum of the execution times for its instructions. Speeding up the processor meant reducing the cycle time, while sometimes reducing the number of cycles per instruction as well. As the number of transistors that could be crammed on a single chip increased, they started exceeding what is typically necessary for such a processor. Thus, the idea of executing more than one instruction at a time has emerged. Designers started using temporal parallelism, overlapping instructions in different phases of execution. Such pipelined processors were able to achieve a much higher throughput than their predecessors, close to one instruction per clock cycle.

The next step in microarchitecture evolution was the superscalar processing paradigm, essentially consisting of multiple pipelines connected together. These processors fetch two or more adjacent instructions, decode them, and then try to execute them in parallel if no data dependency is detected. Finally, out-of-order capabilities have been developed, allowing the processor more freedom in picking the instructions to be executed in parallel. Such processors can achieve a throughput significantly higher than one instruction per cycle, while also being able to work at very high clock speeds.

The current state-of-the-art high-end microprocessors are based on this superscalar, out-of-order microarchitecture. These processors can fetch multiple instructions during each clock cycle, dynamically predicting which will be needed next, well in advance of the actual execution. Instructions are decoded in parallel, and data dependencies are verified. To eliminate any possible false dependencies, the registers are renamed inside a register file that is much larger than the one specified by the ISA. Instructions are reordered according to their data dependencies trying to maximize throughput and, in the end, they are executed by a parallel execution core.

However, in spite of the growing internal complexity, microprocessors must maintain their relative ease of use. They can still be treated as simple “black boxes” which sequentially execute the instructions of a program, and, in most cases, they can still execute programs written two decades ago. All these new mechanisms and capabilities are encapsulated inside a layer that maintains backward compatibility with applications developed for much simpler microprocessors. As long as the programmer obeys the conventions of the ISA, it is not necessary for him to know any of the implementation details, allowing him to choose among a multitude of compatible processors. While this compatibility layer often complicates the microarchitecture even further, it also allows the designers to push the envelope and include features that are not part of the original ISA.

Power Consumption

The evolution of the microprocessor architecture over the last couple of decades has been facilitated by tremendous improvements in silicon process technologies. As predicted in 1965 by Intel’s cofounder Gordon Moore, the number of transistors placed on a single chip has indeed doubled roughly every couple of years. Each new process technology brings smaller and faster devices, allowing designers to create more complex architectures working at faster clock speeds.

With a reduction in size, a single transistor consumes less power with each new process technology. Using state-of-the-art technology to implement an older microarchitecture is a simple method for lowering the power consumption of a processor and it is sometimes used in mid-life product updates. However, the performance level achieved by the resulting processor would be unacceptable in most situations. Most of the time, a new process technology is accompanied by either a completely new microarchitecture or an updated one that is optimized for higher clock speeds.

While the size of a single transistor is typically halved with each new process technology, the die sizes have remained fairly constant over time and are dictated primarily by economic factors. Today’s microarchitectures use the inexpensive silicon real estate by emphasizing execution parallelism, both tem- poral and spatial. This translates into more and more transistors switching during every clock cycle. While 15 years ago Intel’s i486 processor had approximately 1.2M transistors, the latest dual core Pentium 4 uses about 230M transistors (Figure 19.1).

In addition, clock frequencies have gone up as well, driven by both longer instruction pipelines and advances in process technology. Intel’s i486 was released in 1989 at 33 MHz, but Pentium 4 today reaches almost 4 GHz. Since power consumption is directly proportional to both the clock frequency and the number of devices, it is easy to understand why power consumption has evolved into a major problem.

The latest Pentium 4 processor dissipates more than 100 W for a silicon area of ∼1 cm2 [1], starting to show the limitations of our cooling and power delivery capabilities.

Search This Blog

Integrated circuit course

Low Power Microarchitecture Techniques:Power Consumption

Comments

Post a Comment

Popular posts from this blog

SRAM:Decoder and Word-Line Decoding Circuit [10–13].

Architecture:Instruction Set Architecture

Internet-Based Micro-Electronic Design Automation (IMEDA) Framework:Execution Environment of the Framework