Architecture:Industry Trends
Industry Trends
The microprocessor industry is one of the fastest-moving industry today. Healthy demands from the market place have stimulated strong competition, which in turn has resulted in great technical innovations.
Computer Microprocessor Trends
Recent trends of computer microprocessors include deep pipelining, high clock frequency, wide instruction issue, speculative and out-of-order execution, predicated execution, multimedia data types, large on-chip caches, floating-point capabilities, and multiprocessor support. In the area of pipelining, the Intel Pentium 4 processor is pipelined approximated twice as deeply as its predecessor Pentium 3. The deep pipeline has allowed the clock of Pentium 4 processor to run at a much higher clock frequency than Pentium 3. This trend has, however, been reversed in 2005 due to power budget limitations. The pipeline depth of the Intel IA32 microprocessors that succeed Pentium 4 have been reduced toward that of Pentium 3.
In the area of wide instruction issue, the Pentium 4 processor can decode and issue up to three X86 instructions per clock cycle, compared to the two-instruction issue bandwidth of Pentium. More recently, the Intel Itanium and Itanium 2 processors can issue up to six instructions per clock cycle. Wide instruction issue, however, requires multiported register and multiple cache access ports that can significantly increase power consumption. As a result, future microprocessors will likely maintain or reduce the issue width compared to their recent predecessors.
Pentium 4 has dedicated a very significant amount of chip area to branch history table, branch target buffer, reservation stations, load-store queue, and reorder buffer to support speculative and out-of-order execution. These structures together allow the Pentium 4 processor to maintain a large instruction window within which it performs aggressive speculative and out-of-order execution. All these structures, however, consume power in an intensive manner. As a result, the trend of larger instruction windows has also slowed due to power budget limitations.
One important trend of the computer microprocessors in general is the slowdown of the increase in complexity, size, and clock frequency of processor cores. Rather, the industry is moving into incorporating multiple processor cores on the same chip. If all the cores can be productively used, such model can achieve much higher performance than a single core given the same power and chip area budget. This however, places much more burden on the programmer, compiler, and operating system than traditional single core models.
In the area of predicated execution, Pentium 4 supports a conditional move instruction that was not available in Pentium. This trend is furthered by the next-generation IA-64 architecture where all instruc- tions can be conditionally executed under the control of predicate registers. This ability will allow future microprocessors to execute control-intensive programs much more efficiently than their predecessors.
In the area of data types, the multimedia instructions from Intel and AMD have become a standard feature of all X86 microprocessors. These instructions take advantage of the fact that multimedia data items are typically represented with a smaller number of bits (8–16 bits) than the width of an integer data path today (32–64 bits). Based on an observation the same operation is often repeated on all data items in multimedia applications, the architects of multimedia instructions specify that each such instruction performs the same operation on several multimedia data items packed into one register word. Intel first proposed MMX instructions that process several integer data items simultaneously to achieve significant speedup in targeted applications. In 1998, AMD proposed the 3DNow! instructions to address the performance needs of 3-D graphics applications. The 3DNow! instructions are designed on the basis of the concept that 3-D graphics data items are often represented in single precision floating-point format and they do not require the sophisticated rounding and exception handling capabilities specified in the IEEE standard format. Thus, one can pack two graphics floating-point data into one double-precision floating-point register for more efficient floating-point processing of graphics applications. Note that MMX and 3DNow! are similar in concepts applied to integer and floating-point domains. More recently, Intel proposed the SSE instructions to compete with AMD 3DNow!.
In the area of large on-chip caches, the popular strategies used in computer microprocessors are either to enlarge the first-level caches or to incorporate second-level and sometimes third-level caches on chip. For example, the AMD K7 microprocessor has a 64-kB first-level instruction cache and a 64-kB first- level data cache. These first-level caches are significantly larger than those found in the previous generations. For another example, the Intel Celeron microprocessor has a 128-kB second-level combined instruction and data cache. These large caches are enabled by the increased chip density that allows much more transistors on the chip. The Compaq Alpha 21364 microprocessor has both: a 64-kB first-level instruction cache, a 64-kB first-level data cache, and a 1.5-MB second-level combined cache. The recent Intel Itanium processors have up to 9-MB third-level combined cache on chip.
In the area of floating-point capabilities, the computer microprocessors in general have much stronger floating-point performance than their predecessors. For example, the Intel Pentium 4 processor achieves several times of floating-point performance improvements of the Pentium processor. For another exam- ple, most RISC and EPIC microprocessors now have floating-point performance that rival supercomputer CPUs built just a few years ago.
Owing to the increasing demand of multiprocessor enterprise computing servers, many computer microprocessors now seamlessly support cache coherence protocols. For example, the AMD K7 micro- processor provides direct support for seamless multiprocessor operation when multiple K7 microproces- sors are connected to a system bus. This capability was not available in its predecessor AMD K6. The more recent AMD Opteron processors further support HyperTransport protocol to allow each processor in a multiprocessor system to have much higher communication bandwidth than what the traditional memory controllers can support.
Embedded Microprocessor Trends
There are three clear trends in embedded microprocessors. The first trend is to integrate a DSP core with an embedded CPU/controller core. Embedded applications increasingly require DSP functionalities such as data encoding in disk drives and signal equalization for wireless communications. These functionalities enhance the quality of services of their end consumer products. At the 1999 Embedded Microprocessor Forum, ARM, Hitachi, and Siemens all announced products with both DSP and embedded microprocessors [12].
Three approaches exist in the integration of DSP and embedded CPUs. One approach is to simply have two separate units placed on a single chip. The advantage of this approach is that it simplifies the development of the microprocessor. The two units are usually taken from existing designs. The software development tools can be directly taken from each unit’s respective software support environments. The disadvantage is that the application developer needs to deal with two independent hardware units and two software development environments. This usually complicates software development and verification. An alternative approach to integrating DSP and embedded CPUs is to add the DSP as a coprocessor of the CPU. This CPU fetches all instructions and forwards the DSP instructions to the coprocessor. The hardware design is more complicated than the first approach due to the need to more closely interface the two units, especially in the area of memory accesses. The software development environment also needs to be modified to support the co-processor interaction model. The advantage is that the software developers now deal with a much more coherent environment.
The third approach to integrating DSP and embedded CPUs is to add DSP instructions to a CPU instruction set architecture. This usually require brand new designs to implement the fully integrated instruction set architecture. The benefit is that software developers need to deal with just one development environment.
The second trend in embedded microprocessors is to support the development of single chip solutions for large volume markets. Many embedded microprocessor vendors offer designs that can be licensed and incorporated into a larger chip design that includes the desired input/output peripheral devices and application-specific integrated circuit (ASIC), and field programmable gate array (FPGA) design. This paradigm is referred to as system-on-a-chip design. A microprocessor that is designed to function in a such a system is often referred to as a licensable core.
The third major trend in embedded microprocessors is aggressive adoption of high-performance techniques. Traditionally, embedded microprocessors are slow to adopt high-performance architecture and implementation techniques. They also tend to reuse software development tools such as compilers from the computer microprocessor domain. However, owing to the rapid increase of required perfor- mance in embedded markets, the embedded microprocessor vendors are now making fast moves in adopting high-performance techniques. This trend is especially clear in the DSP microprocessors. Texas Instruments, Motorola/Lucent, and Analog Devices have all been shiping aggressive EPIC/VLIW DSP micro- processors and associated compilers.
Microprocessor Market Trends
Readers who are interested in market trends for microprocessors are referred to Microprocessor Report, a periodic publication by MicroDesign Resources (www.MDRonline.com). In every issue, there is a summary of microarchitecture features, physical characteristics, availability, and pricing of microprocessors.
Comments
Post a Comment