 |
 |
 |
Purchase Microprocessor Report
Articles Online
Weekly collections of Microprocessor Report articles
are now available for purchase and download online. Price: $50.
Click Here |
|
 |
|
|
 |
Issue #164 -- 07/28/2003
Editor: Tom R. Halfhill
In this issue:
PicoChip Preaches Parallelism
Ready or Not, 64-Bit Computing Is Here
Equator Revs Media Processor
Elixent Expands SoCs
Wintegra Beefs Up NPU Line
IBM Proliferates 750 Family
Motorola Attacks ASICs
TigerSHARC Swallows DRAM
Tom R. Halfhill - Senior Editor {07/28/2003}
Among the most unusual microprocessors unveiled at Embedded
Processor Forum 2003 was picoChip Design’s new PC101, a massively parallel device
that integrates 430 16-bit processors on a single die. Indeed, the PC101’s resources
are so abundant that, to some degree, they are expendable—the chip’s internal
bus fabric can bypass a few processors ruined by manufacturing defects.
Designed for cellular-telephony and wireless-network base stations, the PC101
is the first implementation of picoChip’s picoArray architecture. It’s based on
a three-field long instruction word (LIW), but it has far greater execution resources
than other LIW or VLIW processors. PicoChip believes that massive parallelism
is the best approach for the compute-intensive tasks of wireless communications,
because it can deliver high performance at low clock speeds, thereby saving power.
In addition, dividing a complex application into many parallel tasks is well suited
for large-team software development projects.
PicoChip, a fabless semiconductor company based in the U.K., will use TSMC to
manufacture the PC101 in a 0.13-micron, eight-layer-metal, digital CMOS process.
The chip is relatively large and packaged in a 528-pin BGA. Samples are available
now, with volume production scheduled to begin in 1Q04.
Perhaps the biggest challenge for a small startup company like picoChip lies in
standing out from the crowd. The onrush of communications processors for cellular
and wireless-data networks is reminiscent of the flood of network processors for
switches and routers a few years ago. When dozens of vendors vie for the attention
of relatively few customers, getting a foot in the door can be the biggest step
of all.
Microprocessor Report readers can access the full story here:
www.mdronline.com/mpr/h/2003/0728/173002.html. To find out more about Microprocessor
Report, please visit: www.mdronline.com.
Peter Glaskowsky - Editor-in-Chief {07/28/2003}
Very soon now, you’ll have a choice of three different
64-bit architectures for your desktop computer. The AMD Athlon 64 processor will
be available in systems running Linux and Windows, and IBM’s PowerPC 970 will
ship in Apple’s new Power Mac G5. The third option is perhaps the least well known,
though it’s been available for some time—Sun’s UltraSparc III, shipping in Sun
Blade 150 workstations priced as low as $1,395.
Some of these segment associations are a little vague. The Sun Blade 150 is a
workstation in name only, having low-performance processor, graphics, and mass-storage
components. The G5 is a workstation masquerading as a desktop, offering premium
features at a premium price. AMD would probably prefer that its workstation OEMs
use Opteron chips, but Athlon 64 will be widely used in both desktop- and workstation-class
systems.
Whatever you call it, a 64-bit desktop has capabilities not found in commodity
32-bit machines. Though we’ve long had desktops with performance to match the
best supercomputers of a decade ago, we haven’t been able to run the same software
on them. Huge databases and complex scientific simulations can now be hosted on
inexpensive machines that support 64-bit addressing in both hardware and software.
These sophisticated programs, however, probably won’t create much end-user demand
for 64-bit systems. Most of us will continue to visit www.weather.com rather than
attempting to predict tomorrow’s weather ourselves. Similarly, we’ll continue
to use conventional client-server database programs rather than hosting billion-record
databases directly on our desktops. Over time, some of these applications may
migrate to the desktop, but with new purposes. Programs created to help NASA scientists
visualize the surface of Mars will turn into 3D games, and, as hard disks get
bigger and bigger, 64-bit database engines will be used to manage our local file
systems.
Unfortunately, this migration won’t be fast enough to help AMD, IBM, and Sun sell
their desktop processors this year—or next. Some customers will be savvy enough
to realize that if they expect to be running 64-bit desktop software in 2005,
they should start buying 64-bit systems now, but that’s a tough sell for most
customers. Until 64-bit software is widely available, price and performance will
continue to be the most important selling points.
Accordingly, we can expect to see 64-bit computing presented as a performance
advantage with immediate relevance. In some small ways, it is. The ability of
these 64-bit platforms to support large amounts of DRAM can speed up memory-hungry
programs, and the larger register sets associated with 64-bit CPUs can improve
performance on recompiled 32-bit applications—but these are not very dramatic
effects. Most performance benefits related to 64-bit processing have already been
realized in 32-bit systems by means of instruction-set extensions such as Intel’s
SSE, AMD’s 3DNow!, IBM/Motorola’s AltiVec, and Sun’s pioneering VIS.
All these extensions implemented 64-bit (or wider) datapaths and register sets
for special-purpose instructions. AMD and Apple, in particular, will find ways
for general-purpose code to benefit from 64-bit processing, but these benefits
will, at best, be incremental over the capabilities of 3DNow! and AltiVec.
If 64-bit addressing is of little immediate importance, and 64-bit processing
offers only a slight performance advantage, what will AMD and Apple use to sell
their new hardware? Both companies have a simple, strong performance argument,
but it has nothing to do with 64-bit capabilities. This argument will certainly
be the centerpiece of the Athlon 64 and G5 marketing efforts, but both companies
are clearly compelled to do something with their 64-bit story, however weak it
is.
There’s an old lawyer joke: When the law is on your side, plead the law. When
the facts are on your side, plead the facts. When neither the facts nor the law
are on your side, plead loudly. We can expect to hear some loud pleading in the
months to come.
To find out more about Microprocessor Report, please visit:
www.mdronline.com.
Tom R. Halfhill - Senior Editor {07/28/2003}
Equator Technologies unveiled a new member of its media-processor
family at Embedded Processor Forum 2003, claiming the chip will deliver more signal-processing
performance than any other VLIW architecture. The new processor, the BSP-16, is
scheduled for production in 3Q04.
Designed for digital TV, digital video recorders, video conferencing, and other
media-oriented applications, the BSP-16 follows Equator’s BSP-15 and MAP-CA. The
latter chip won the MPR award for Best Media Processor of 2000. (See MPR 1/29/01-04,
“Media Processors Gain Ground” and MPR 3/13/00-04, “MAP-CA Ready for Prime Time.”)
The BSP-16 is similar to the BSP-15 but isn’t pin compatible with it, due in part
to new interfaces for DDR-SDRAM, IDE, and a UART.
Despite the new I/O features and a boost in clock frequency to 350–500MHz (vs.
300–405MHz for the BSP-15), the new chip will consume about half as much power:
1–2W typical. A shrink from TSMC’s 0.15-micron process to the 0.13-micron “G”
process makes the difference. Core voltage drops to 1.0V, with 3.3V I/O. PCI is
5V tolerant. The integrated memory controller is compatible with 32/64-bit DDR-SDRAM
(166MHz at 2.5V I/O). Equator says a BSP-16 running flat out at 500MHz will consume
only 2W while encoding or decoding an MPEG-2 video stream.
Microprocessor Report readers can access the full story here:
www.mdronline.com/mpr/h/2003/0728/173004.html. To find out more about Microprocessor
Report, please visit: www.mdronline.com.
Tom R. Halfhill - Senior Editor {07/21/2003}
Seeking a soft spot between a rock and a hard place,
U.K.-based Elixent is introducing a massively parallel processor core that strives
to combine the programmability of a general-purpose processor with the performance
of a hard-wired ASIC. The goal: a more flexible system-on-chip (SoC) processor
that consumes less power and adapts quickly to different tasks, amortizing the
development costs of an SoC over multiple projects.
Elixent, a three-year-old spinoff from Hewlett-Packard Labs, described its new
D-Fabrix architecture at Embedded Processor Forum 2003. The D-Fabrix is based
on a concept that Elixent calls reconfigurable array processing (RAP). It’s similar
to the reconfigurable compute fabric (RCF) in the MRC6011 processor that Motorola
also presented at Embedded Processor Forum. (See MPR 7/14/03-01, “Motorola Attacks
ASICs.”) A key difference is implementation: Elixent licenses the D-Fabrix as
a hard macro for SoC integration, whereas Motorola offers the MRC6011 as a standard
part.
Microprocessor Report prefers to call these processors reprogrammable or run-time-programmable
rather than reconfigurable, because their logic is static at run time. (See the
sidebar, “Defining Reconfigurable Processing” in MPR 7/14/03-01, “Motorola Attacks
ASICs.”)
Nevertheless, the Elixent and Motorola processors offer a different model of execution
than do general-purpose microprocessors and programmable DSPs. Conventional processors
execute a common stream of instructions fetched over a bus from on- or off-chip
memory that is global to all the function units. In addition, function units of
the same data type share a global register file. The Elixent and Motorola processors
can simultaneously execute independent streams of instructions fetched from on-chip
memory that’s local to each function unit, and each unit has its own local register
file.
At the other end of the spectrum, a custom ASIC should have little trouble outperforming
a D-Fabrix or RISC-based SoC if flexibility isn’t an issue. Well-designed ASICs
can exploit parallelism, too, and their hard-wired functions don’t need reprogramming
and aren’t burdened with instruction fetching. An ASIC should be able to outrun
a D-Fabrix SoC, whose complex architecture severely limits the maximum clock frequency.
In other words, it’s the old programmability-vs.-performance debate that stretches
as far back as the 1940s. The D-Fabrix architecture’s highly parallel local-execution
model, with an array of independently programmed processing units, offers an interesting
middle ground.
Microprocessor Report readers can access the full story here:
www.mdronline.com/mpr/h/2003/0721/172901.html. To find out more about Microprocessor
Report, please visit: www.mdronline.com.
Peter Glaskowsky - Editor-in-Chief {07/21/2003}
At Embedded Processor Forum 2003, Wintegra, a network-processor
vendor focused on the network-access market, introduced two new devices that offer
superior performance and integration over previous Wintegra offerings. The new
WIN780 and WIN787 target wireless communications systems, digital subscriber-line
(DSL) installations, voice-over-packet applications, and similar systems on the
edge of large networks such as the Internet. The new chips begin sampling in 4Q03;
pricing and production dates have not been announced.
The WIN787 combines a MIPS 5KC RISC processor with the WinComm packet-processing
subsystem, on-chip peripherals, and off-chip interfaces. The WIN780 omits the
MIPS core but is otherwise identical. Packet processing is handled by four proprietary
cores in the WinComm subsystem. Wintegra provides all the software needed for
packet processing on layers 2 and 3 of the protocol stack; customers can focus
on the higher-layer functions that provide more-valuable product differentiation.
All WinComm software is written in a high-level language, a subset of C defined
by Wintegra.
Wintegra isn’t the largest of the network-processor companies, but it has proved
its ability to innovate and to adapt to rapidly changing market conditions. Its
new WIN780 and WIN787 network processors reflect a clear understanding of customer
needs and provide a solid foundation for commercially successful products.
Microprocessor Report readers can access the full story here:
www.mdronline.com/mpr/h/2003/0721/172902.html. To find out more about Microprocessor
Report, please visit: www.mdronline.com.
Markus Levy - Senior Editor {07/21/2003}
IBM has announced details of the newest member of the
750 PowerPC family, the 750GX. Although the basic microarchitecture has remained
the same, the product family has undergone numerous circuit modifications, process
shifts, and architectural enhancements. The most notable enhancements applied
to the 750GX include a 1MB four-way set-associative level 2 cache; additional
L1 and L2 cache buffers; and the capability for up to 200MHz operation of the
60x system-bus interface.
The large L2 cache should significantly increase the price of the 750GX over that
of the 750FX, but it will provide a huge performance boost. Furthermore, the 1MB
L2 makes it easier to eliminate the need for an external L3 and helps the 750GX
deal with the lack of DDR memory support. The 750GX is manufactured in IBM’s 0.13-micron
copper process with silicon-on-insulator technology and will be available in frequencies
ranging from 733MHz to 1.1GHz. At an operating voltage of 1.45V, the 750GX consumes
an estimated 8W at 1GHz.
Microprocessor Report readers can access the full story here:
www.mdronline.com/mpr/h/2003/0721/172904.html. To find out more about Microprocessor
Report, please visit: www.mdronline.com.
Tom R. Halfhill - Senior Editor {07/14/2003}
More frightening than any Halloween mask is the over–$1
million price tag on a deep-submicron mask set. No wonder everyone is looking
for ways to exorcise the demon. Motorola’s latest weapon is the MRC6011, a new
chip that has a programmable RISC controller, internal peripherals, and six DSP
cores, each with 16 function units. Designed primarily for wireless infrastructures,
the MRC6011 is an off-the-shelf alternative to a costly ASIC project or a conventional
DSP.
Motorola disclosed architectural details of the MRC6011 last month at Embedded
Processor Forum 2003. The chip is suitable for many compute-intensive applications,
but the instruction set, microarchitecture, and DSP cores make it particularly
useful for baseband processing in 3G-cellular and wireless-LAN base stations.
In that role, the MRC6011 can replace a fixed-function ASIC or programmable DSP
while maintaining high performance. Because it’s fully programmable, field upgrades
are easier than with systems based on custom ASICs. The ability to deploy soft
upgrades—perhaps remotely over the network—is a valuable feature when communications
protocols and industry standards are rapidly evolving.
By far the most interesting feature of the MRC6011 is what Motorola calls a reconfigurable
compute fabric (RCF), which will also appear in future chips in this series. However,
Motorola’s use of the term “reconfigurable” doesn’t mean the chip has reprogrammable
gates, as with an FPGA. Instead, the reconfigurable elements are small DSP cores
that have their own local registers, memories, and arrays of function units. Once
programmed, these self-contained cores can independently execute all or part of
an algorithm locally, without fetching instructions from off-core or off-chip
memory, so all their I/O bandwidth is available for data throughput.
Microprocessor Report readers can access the full story here:
www.mdronline.com/mpr/h/2003/0714/172801.html. To find out more about Microprocessor
Report, please visit: www.mdronline.com.
Markus Levy - Senior Editor {07/14/2003}
Embedded DRAM was the focus of the Analog Devices announcement
of its TigerSHARC TS201S at Embedded Processor Forum 2003: the TS201S contains
3MB of eDRAM. TigerSHARC executes as many as four instructions per cycle with
its interlocking eight-stage pipeline and dual computation blocks. Each block
contains a multiplier, an ALU, and a 64-bit shifter and can perform one 32- ´
32-bit or four 16- ´ 16-bit multiply-accumulates (MAC) per cycle.
The TS201S has four dedicated data buses associated with the core’s functional
blocks. These buses are 128 bits wide and can transfer 9.6GB of data per second.
The TS201S processor contains 24Mb of eDRAM memory that connects to the four internal
buses through a crossbar interface connection.
When eDRAM is compared with external DRAM, the benefits of eDRAM are obvious.
Those benefits include reduced latency, dramatically increased bandwidth to main
memory, reduced power consumption, reduced space, and the multiple, independent
address/data streams. The TS201S’s eDRAM operates at half the core speed, which
restricts the performance of the chip’s high-speed buses. However, the chip’s
eDRAM is split into six memory blocks supported by 16K caches and 1K prefetch
buffers that potentially eliminate the latency of the chip’s eDRAM.
As with any DRAM, the TS201’s eDRAM requires periodic refresh that automatically
occurs every 32ms per subarray, although the chip supports the programming of
higher-frequency refresh rates. The good news is that refresh-associated stalls
have a minimal negative effect on performance, thanks to the integrated cache.
The better news is that eDRAM cells have virtually zero leakage current, especially
compared with SRAM.
From a performance perspective, ADI claims the TS201 supports an “all software
solution” for processing both the chip and symbol-rate functions of a 3G base
station. However, this is in contrast to the approach being promoted by Texas
Instruments, ADI’s biggest competitor.
Microprocessor Report readers can access the full story here:
www.mdronline.com/mpr/h/2003/0714/172802.html. To find out more about Microprocessor
Report, please visit: www.mdronline.com.
|