Vol 21, Issue 9 |
 |
March 26, 2007 |
By Max Baron
At any point in time, processors can be divided into three categories: those whose specifications of performance, power, and price are good enough for an application and are bounded by it (more performance doesn't make the app better); processors whose progress is unbounded as they reach for applications that have never been conquered before; and processors too exotic or passé to make it into the system designer's tool chest.
In category one, applications that have found matching processors may be limited by the physical performance of the system under control. Consider, for example, processing the next page for the rotating drum of a copier. Completing the workload faster won't make the printing faster. Or the application may be limited by an industry standardan MP3 player, for example, or an MPEG2 decoder.
Category two's open-ended applications are driving innovation at all levels, through continued demand for better performance and efficiency. Speech recognitionmore precisely, speech comprehensionstill struggles with the variety of human language, waiting for large-scale improvements in processing. Today's cameras can separate a face from the background, but they can't recognize grandma to remove a few age lines from her face or accentuate her blue eyes. And we're still waiting for the ideal camera, one that doesn't make any compromises. It should take still images and high-definition video equally well. When it does, at a certain price point, its processor will move to the good-enough category.
As we prepare for this year's Microprocessor Forum (May 2123 in San Jose, California), we draw conclusions about these two categories, their vendor strategies, and plans for the future.
Life for the app-adequate processor in the first category isn't easy, but its roadmap is clear. It may be the first engine to have conquered the application, but soon it won't be the only one. Others will catch up. Prices will drop as, together, these processors and systems try to make up in volume what they are losing in revenue. To reduce cost, designers will mostly push for higher integration and less silicon real estate. There is less opportunity for innovations in architecture but a lot of motivation to excel in cost-reduction engineering.
At the other end of the spectrum, the second category's targets are set by performance and efficiency. At equal power consumption compared with previous technology, high-k gate dielectrics and other leakage-reduction methods can help improve frequency by a factor of two, at best. The search for performance has shifted to multiple processor cores and parallel processing to gain performance at lower power and lower temperature.
Multiple processor cores are the best configurations for the mighty desktop and server processors. Semiconductor vendors are introducing them in dual, quadand soon maybe in oct configurations.
The massively parallel processor is now reborn, offering extremely high performance. It employs hundreds of simple processors that make up in numbers for the performance they don't deliver in frequency. The parallel processor can't yet go into general-purpose computing. It can be employed in scientific computing, imaging, audio, video, and other applications whose data-intensive workloads are parallelizable. The massively parallel processor can process many streams of multimedia. But, until a year ago, it seemed that the parallel processor's only place was in low-volume network and communications infrastructure equipment whose characteristic workloads were a good match for the capabilities of this engine.
The abstracts of papers proposed for this year's Microprocessor Forum reflect the three strategies employed by intellectual property (IP) owners and processor manufacturers: one, sell more silicon and more components; two, find additional applications; three, push for performance.
For several years, vendors of processor chips and processor cores have offered more than the processor itself. Auxiliary processors, peripherals, buses, mixed-signal devices, and even RF are being offered now. Application-oriented improvements in instruction-set architectures and accelerators have brought the SoC close to being the system itself. Software has followed. Licensed from ISVs, codecs and system software are ported to complete the platforms. These platforms offer quick-turnaround support for communication and consumer systems with life cycles shorter than one year. Platform-level products and tools are now offered by virtually all IP and semiconductor vendors. By offering more silicon and more components, the vendors' strategy allows them to compete with one another on a higher level and obtain more revenue in a market that keeps reducing the price of silicon.
High performance obtained by dual or quad cores is a different type of performance. Unless the workload can take full advantage of threads or data-parallel functions, the gain in perceived speed is sometimes different from that of a single processor that runs at, say, twice the frequency of a slower one. Perhaps one of the best examples seen at the Consumer Electronics Show in January was a battle game that added warriors with each additional core powered up. The machine was a Shuttle game system employing an Intel Quad processor and two NVIDIA graphics accelerator boards. The Shuttle is also offered with AMD ATI graphic accelerators.
The massively parallel engine is better used in executing data-parallel workloads. It may or may not use one of the cores provided by leading IP vendors, because the instruction-set architecture seems less important than the employment of resources. Some of the least known parameters in parallel processing are communications and local memory and their dependence on workload- and software-programming tools. Software continues to be a challenge.
Massively parallel processors are looking for more applications than are offered by the low volumes and tight budgets of infrastructures. Security can be a good target, but new automotive applications can be even better, because they include information processing, communications, navigation, entertainment, maneuverability, and passenger safety. Many of these applications require high performance on highly parallelizable workloads. The Scotiabank Group estimates worldwide car sales for 2006 in the range of 49 million, with a slightly lower forecast for 2007. At IEEE's DAC 2007, the theme is automotive electronics. We will have a special session on automotives at Microprocessor Forum in San Jose, and our event in Tokyo is dedicated mainly to automotive systems. The performance-efficient parallel engine may have found its first real opportunity in the automotive environment.
The strategy of dual- and quad-core vendors is to look for additional thread-rich applications. Vendors of massively parallel processors are looking for data-intensive parallelizable applications.
The ultimate game, however, continues to be performance. And it's more challenging than ever before, because we're looking for efficient execution, not just faster execution. The push for efficient performance involves architecture, semiconductor technology such as high-k dielectrics, and design for best clock distribution. The power management can be so tight, so close to the boundaries of perfect functions, that execution can cross overby designinto previously uncharted areas of errors that need correction. Wireless data communication has always needed to cope with error correction, so it should not be difficult to accept that a processor can be designed to correct errors it (not alpha particles) is expected to produce.
It's difficult to remember a time when semiconductor technology, design, architecture, and software were more in flux and more exciting than right now.
|