 |
 |
 |
Purchase Microprocessor Report
Articles Online
Weekly collections of Microprocessor Report articles
are now available for purchase and download online. Price: $50.
Click Here |
|
 |
|
|
 |
October 11, 2004
Editor: Tom R. Halfhill
In this issue:
IBM Makes Designer Genes
Double Your Opterons; Double Your Fun
Centaur CN Is Super(scalar) 64 Bits
Cavium Branches Out
Tom R. Halfhill - Senior Editor {10/11/2004}
Designing the world’s fastest supercomputer by drawing
inspiration from embedded processors seems like imitating a Vespa when building
a Formula 1 racer. Aren’t lowly embedded chips supposed to be on the receiving
end of hand-me-down technology? As we’ve seen in the past few years, however,
embedded processors are blazing the trail for multicore designs, hardware multithreading,
massively parallel processor arrays, high-speed on-chip interconnects, and other
advanced design strategies.
So perhaps it’s no surprise that IBM Microelectronics would pattern a new supercomputer
processor after an embedded system-on-chip (SoC), even to the point of recycling
a five-year-old processor core previously found only in embedded parts. Moreover,
IBM readily acknowledges the new processor’s ancestry, taking pride in a design
that combines performance with parsimony.
The new dual-core supercomputer processor springing forth from the embedded gene
pool is called BlueGene/L. At the recent Fall Processor Forum (FPF) in San Jose,
California, IBM revealed new details about this fascinating chip. It’s destined
for an awesome supercomputer of the same name, which will harness the power of
65,536 processor chips (containing 131,072 PowerPC processor cores) and 32 terabytes
(TB) of main memory. When the first BlueGene/L supercomputer is finished next
year, IBM expects it to deliver peak performance of 360 trillion floating-point
operations per second (teraflops). And it will run at only 700MHz, an embedded-realm
clock frequency that would provoke snickers from PC users.
On September 29, IBM announced that a BlueGene/L prototype sustained 36.01 teraflops
during internal testing with the Linpack benchmark at IBM’s lab in Rochester,
Minnesota. That unofficial benchmark edges out NEC’s first-place Earth Simulator
supercomputer, which occupies 100 times as much space and consumes 28 times as
much power as IBM’s prototype.
Note that IBM’s BlueGene/L prototype has 8,192 dual-core processor chips—only
12% as many chips as envisioned. When IBM builds out the machine to its full complement
of 65,536 chips with 131,072 processor cores, its theoretical peak performance
will surpass 360 teraflops. Sustained Linpack performance won’t reach that stratosphere;
nevertheless, BlueGene/L will be a significant leap forward in computing power.
Microprocessor Report readers can access the full story (5 pages, 4 figures) here:
www.mdronline.com/mpr/h/2004/1011/184101.html. To find out more about Microprocessor
Report, please visit: www.mdronline.com.
Kevin Krewell - Senior Editor {10/11/2004}
At Fall Processor Forum 2004, AMD Fellow Kevin McGrath
revealed additional details of the dual-core AMD Opteron processors. The new processors
will fit within the same thermal envelope and 940-pin socket specified for the
upcoming 90nm Opteron processors, making system upgrades much simpler. AMD also
expects to be first to market with the x86 dual-core processors, in mid-2005.
The first parts AMD plans to introduce are dual-core Opteron processors for the
one- to eight-socket server and workstation market. Later in 2H05, AMD plans to
release dual-core client processors under the Athlon 64 brand. The client processor
will likely be a straightforward derivative of the Opteron version.
Like most builders of first-generation dual-core processors, AMD largely built
two complete processors on one die, including separate 1MB L2 caches. (See MPR
7/6/04-01, “AMD vs. Intel in Dual-Core Duel.”) What AMD additionally provided
is a shared north-bridge logic with three HyperTransport links and a dual-channel
(128-bit) DDR memory interface.
Microprocessor Report readers can access the full story (3 pages, 2 figures) here:
www.mdronline.com/mpr/h/2004/1011/184102.html. To find out more about Microprocessor
Report, please visit: www.mdronline.com.
Kevin Krewell - Senior Editor {10/05/2004}
For many years at Microprocessor Forum, we’ve heard
Glenn Henry, iconoclastic founder and president of VIA’s Centaur Division, talking
about his simple, small x86 processors. This year is different, however, for two
reasons. First, Microprocessor Forum has been renamed Fall Processor Forum; second,
Henry revealed that VIA/Centaur’s next-generation CN architecture is superscalar,
capable of decoding up to three x86 instructions per cycle. The other key attribute
of the CN architecture is that it has been completely redesigned for 64-bit processing.
For years, Henry has stuck to a simpler, scalar design on the theory that a simpler
processor is smaller and cheaper to manufacture but can nevertheless handle the
majority of basic Internet access and productivity applications. But newer process
geometries have made it possible to put additional logic into the core while keeping
it small. In addition, PC workloads now carry a greater emphasis on multimedia
data. VIA resisted the move to multicore and multithreading, as plenty of instruction-level
parallelism had been untapped by VIA’s earlier architecture.
The CN design increases media performance with fast floating-point units and improved
system bandwidth. The media unit has been beefed up with 128-bit-wide datapaths
supporting four single-precision or two double-precision values. The floating-point
adder and multiplier support two-cycle latency for single-precision adds and multiplies.
The double-precision add instruction also supports two-cycle latency and three-cycle
latency for double-precision multiplies. In addition, the CN has an improved L2
cache design with more ports and is nonblocking.
The CN architecture supports the AMD64 (and EM64T) extensions to the x86 instruction-set
architecture.
In addition to the new CN architecture, Henry updated the status on the current
parts. VIA has silicon back on the C5J processor. The die size is 31.7mm2, built
in IBM’s 90nm SOI process. All previous C3 processors were built by TSMC, and
VIA will continue to build derivatives of the C5P. The official VIA brand name
for the C5J processor will be the C7, a significant bump up from the present C3.
The mobile version will be the C7-M.
Microprocessor Report readers can access the full story (3 pages, 3 figures) here:
www.mdronline.com/mpr/h/2004/1005/184002.html. To find out more about Microprocessor
Report, please visit: www.mdronline.com.
Tom R. Halfhill - Senior Editor {10/05/2004}
Already a well-regarded vendor of security processors,
Cavium Networks is moving in a bold new direction. The company’s new Octeon family
of networking processors integrates three important functions in a single chip:
packet processing, content filtering, and security. To provide enough muscle for
all that heavy lifting, at line rates up to 10Gb/s, Octeon chips will have as
many as 16 MIPS-compatible 64-bit processor cores, augmented by numerous coprocessors.
Cavium refers to the Octeon chips as “network services processors”—a new data-plane
category that absorbs the functions of separate packet processors, security processors,
and content-filtering accelerators. Cavium believes all those Layer 3–7 data-plane
functions are becoming so mandatory it’s time to integrate them in a single chip.
Higher-level integration can dramatically shorten the packet datapaths, eliminate
redundant packet processing, simplify board designs, and cut costs. It’s a grand
strategy, and Octeon chips have ample resources to carry it out.
Target applications are secure network-interface cards, routers, and wireless-LAN
switches; server load-balancers, web-service appliances, router blades, and content-filtering
switches; and host bus adapters and switches for network storage subsystems. Currently,
these applications require multiple chips—often including ASICs and FPGAs—to handle
the packet processing, filtering, and security functions consolidated in Octeon.
Cavium claims Octeon will typically cut costs to one-fifth of existing solutions
and similarly reduce board area and power consumption.
The Octeon family continues Cavium’s tradition of designing high-performance processors
for vital networking tasks. Last year, Cavium’s Nitrox Plus CN1340p won our Microprocessor
Report Analysts’ Choice Award for Best Security Processor of 2002. (See MPR 12/18/03,
“Security By Design.”) Cavium also sells a line of chips known as Golden Gate
Bridge processors—small I/O bridge chips for SPI-3, SPI-4.2, PCI, and PCI-X. Cavium
announced the Octeon family on September 13 and took the stage at Fall Processor
Forum on October 5 to reveal the first technical details about the new processors,
which are scheduled to begin sampling in 1Q05 and enter production in 2H05.
Microprocessor Report readers can access the full story (5 pages, 2 figures) here:
www.mdronline.com/mpr/h/2004/1005/184001.html. To find out more about Microprocessor
Report, please visit: www.mdronline.com.
Most Recent Processor Watch Articles
Past Processor Watch Articles
|