 |
 |
 |
Purchase Microprocessor Report
Articles Online
Weekly collections of Microprocessor Report articles
are now available for purchase and download online. Price: $50.
Click Here |
|
 |
|
|
 |
September 13, 2004
Editor: Tom R. Halfhill
In this issue:
Sony’s PSP: Maximum Technology
Sun’s Niagara Pours on the Cores
Hot Chips 16 Goes to Mars and Back
ARM Extends Its Reach
Another Tale of Two Instructions
Max Baron - Principal Analyst {09/13/2004}
Handheld game aficionados in the United States must
be disappointed by Sony’s February announcement that the company’s much expected
PlayStation Portable (PSP) will not be available in the U.S. until spring 2005
but will be delivered to stores in Japan before the end of this year. The official
reasons given for the delay were that some of the multiple games planned for the
product’s introduction were not yet ready.
Sony’s Computer Entertainment president and CEO Ken Kutaragi called the much expected
toy “the Walkman of the twenty-first century,” implying perhaps that the new handheld
device is expected to play an interesting role in the handheld playing of MP3
files, operation of digital cameras, and viewing of movies. With Sony’s existing
capabilities in LAN, Bluetooth, Wi-Fi, and cellular telephony, the variety of
product possibilities must have justified the architects of the new chip in exploring
new technologies in the process space; special codec-optimized hardware; 3D graphics;
a reconfigurable processor; and, to keep the data- and instruction-hungry processors,
a generously sized embedded DRAM.
Microprocessor Report readers can access the full story (4 pages, 4 figures) here:
www.mdronline.com/mpr/h/2004/0913/183701.html. To find out more about Microprocessor
Report, please visit: www.mdronline.com.
Kevin Krewell - Senior Editor {09/13/2004}
One of the most anticipated server processors had its
first public unveiling at Hot Chips 16—Sun’s Niagara processor with eight multithreaded
cores on one die. The talk focused on the processor cores, memory, and threading
but didn’t cover other system-integration issues, such as I/O expansion bus and
security integration. The combination of eight cores and four threads per core
creates a chip that can support 32 threads.
The Niagara processor consists of eight single-issue in-order cores with modest
clock speeds and no speculative execution. The eight SPARC cores connect to a
3MB L2 cache through a nonblocking crossbar switch. Each processor core has a
16K L1 instruction cache and an 8KB L1 data cache. The L1 caches are quite small,
but with multiple threads available, L1 misses become less critical, as threading
can hide the L2 cache latency.
The SPARC cores in Niagara are actually quite simple. Each core has a six-stage
pipeline and implements the SPARC V9 architecture. Each core has support for four
threads. With a goal to be as power-efficient as possible, the team left out speculative
execution, because it often performs work that is later thrown away if it is incorrectly
speculated, wasting energy. Because the pipeline is short and there are multiple
threads per core, branch prediction becomes unnecessary and was also jettisoned.
The core can hide the time required to fetch the new instruction stream on a taken
branch by switching to the other threads during the clock delay.
At present, no server-processor vendor has anything like the highly threaded Niagara
processor. In many regards, Niagara looks more like an embedded network processor
than a traditional server processor. The key to unlocking the potential of Niagara
will be in Solaris’s ability to manage the highly threaded environment. Here,
Sun claims it has a significant lead on the open-source Linux operating system.
Niagara pushes the envelope of highly threaded yet power-efficient server systems.
Microprocessor Report readers can access the full story (3 pages, 4 figures) here:
www.mdronline.com/mpr/h/2004/0913/183702.html. To find out more about Microprocessor
Report, please visit: www.mdronline.com.
Kevin Krewell - Senior Editor {09/13/2004}
Hot Chips 16 had some very “hot” presentations, along
with a few that were obviously just warmed over. The presentations, made at the
Stanford University campus this year, were an eclectic collection of technologies
ranging from instructions simulators to the first one billion plus–transistor
microprocessor—from Intel, of course.
There was an interesting keynote on the Mars rovers Opportunity and Spirit that
showed what can be done with technology well over a decade old. It explored the
problems of using a 10-year-old operating system (DOS) mixed with an embedded
design that had limited resources (in this case DRAM memory) and an off-the-shelf
embedded flash file-system software that continued to grow uncontrolled in a mission-critical
application. Although it proved obviously difficult to debug a system from more
than 35 million miles away, especially when it’s in a near-fatal constantly rebooting
state, the NASA and JPL teams managed to find the software problem, resolve it,
and get the rover back to a healthy and productive state.
The pictures and scientific data from Mars are truly amazing. In considering the
time delay from Mars to Earth, NASA and JPL placed enough intelligence into the
rover to make it autonomous, but at a very slow pace. The rover can traverse the
planet, taking measurements and avoiding obstacles autonomously.
Amongst the chips discussed at Hot Chips was Intel’s PXA27x family of low-power
ARM-based applications processors. In addition, Nvidia talked about two graphics
chips, one for cellphones (the SC-10), and provided an overview of the latest
DirectX 9 GPU, the 6800. The SC-10 is a companion chip for cellphones that can
accelerate image, video, and 2D and 3D graphics; it is derived from work that
was started at GigaPixel before its merger with 3dfx and Nvidia’s subsequent purchase
of the 3dfx assets and intellectual property. The chip combines almost 1.3MB of
wide SRAM with a separate MPEG 4 encoder, MPEG 4 decoder, JPEG encoder, JPEG decoder,
2D engine, 3D engine, and various I/O. The use of separate blocks allows concurrent
operation and fully optimized and modularized logic. The chip is being built in
UMC’s 0.15-micron process and consumes 6.8 million transistors.
As cellphones increasingly become multifunction devices, there are plenty of opportunities
for innovation in architecture and integration. Another processor along those
lines was the SH-Mobile3 chip from Hitachi/Renesas/SuperH. This SoC design combined
a SuperH CPU core, 256KB of user RAM, an MPEG 4 unit, a 3D graphics engine, and
a hardware Java accelerator. The chip designers spent considerable efforts in
reducing leakage currents, even putting power switches on chip to cut the Vss
paths. Special attention was placed on a low-leakage data-retention memory that
could turn off unneeded sense amps when the chip was in data-retention (sleep)
mode. The data-retention mode could save about 95% of the typical leakage current
consumed by memories of this size. The core is a seven-stage pipeline with a dual-issue
tightly coupled Java byte-code accelerator. The designers expect power consumption
of 0.57mW/MHz at a core voltage of 1.2V, using a 130nm process with five copper-interconnect
layers.
AMD talked about low-power design, a new design focus for the company. Designing
power-efficient processors requires a combination of process design, circuits
design, and architecture. The semiconductor design for AMD’s 90nm process will
be built on SOI wafers but will add dual gate-oxide thicknesses and thicker nominal
channel lengths to reduce leakage and improve reliability. These conservative
process characteristics will be combined with three internal threshold voltages.
The use of three thresholds allows further fine-tuning of speed paths for lower
power and leakage. The L2 cache RAM consisted almost completely of high-Vt transistors,
as speed is less critical.
AMD will also add more voltage controls (AltVID) to lower the core voltage. On
the design side, the processor has an improved halt state that can reduce power
by disabling instruction retirement in the reorder buffer. This action causes
the microcode engine to stall; therefore, the register file, reservation stations,
and microcode ROM also stall and can power down.
Another short presentation, related to AMD’s Opteron server processor, came from
Newisys. The company has developed an ASIC that can glue clusters of four Opteron
processors into larger arrays with excellent scalability. The ASIC, called Horus,
has taped out and is being fabricated by TSMC. The Horus chip connects to the
quad Opteron clusters through coherent HyperTransport but extends coherency through
a directory-based scheme with programmable protocol engine. Details were thin,
but the company claims very linear scaling, up to 16 processors. Horus will also
add more capabilities to Opteron servers, such as partitioning, blade support,
machine check features, and other improvements for RAS and manageability.
Microprocessor Report readers can access the full story (2 pages) here:
www.mdronline.com/mpr/h/2004/0913/183703.html. To find out more about Microprocessor
Report, please visit: www.mdronline.com.
Tom R. Halfhill - Senior Editor {09/07/2004}
Business analysts and investors are still debating whether
ARM’s whopping $913 million acquisition of Artisan Components makes financial
sense, but from a technology standpoint, it launches ARM into a whole new realm.
Among other things, it erases all doubt that ARM is becoming a start-to-finish
provider of semiconductor intellectual property (IP), not just a vendor of embedded
microprocessor cores.
In truth, ARM has been reaching beyond processors for years. In the late 1990s,
ARM’s AMBA on-chip bus standard, PrimeCell soft IP, and PrimeXsys design platform
signaled a strategic move toward a more holistic approach to system-on-chip (SoC)
integration. In June of this year, ARM expanded its PrimeCell portfolio with new
AXI system-level components, including a configurable on-chip interconnect fabric
and memory controllers. ARM’s OptimoDE configurable data engine, also announced
this summer, is aimed squarely at embedded applications that need more processing
power than an ARM core alone can deliver. (See MPR 6/7/04-01, “ARM’s Configurable
OptimoDE.”) And on August 16, ARM announced its acquisition of Axys Design Automation,
a vendor of system-level design tools and models.
ARM’s acquisition of Artisan, announced August 23, is a much bigger step. In fact,
it looks like a bet-the-company proposition. Financially, the acquisition is so
large it’s practically a merger. It vastly expands the scope of ARM’s business
by adding a wealth of physical library IP, including embedded memories, peripheral
cores, system-interface physical-layer (PHY) components, and standard-cell libraries
for digital, analog, and mixed-signal ICs. Artisan has more than 1,200 customers
and partners, including some of ARM’s competitors, such as ARC International,
MIPS Technologies, Sonics, SuperH, Tensilica, and TriMedia. (ARM says the acquisition
won’t alter those relationships.) When the deal is complete, ARM will have at
least 1,200 employees worldwide.
Microprocessor Report readers can access the full story (2 pages) here:
www.mdronline.com/mpr/h/2004/0907/183601.html. To find out more about Microprocessor
Report, please visit: www.mdronline.com.
Tom R. Halfhill - Senior Editor {09/07/2004}
Digging into the past of the x86 architecture is like
archaeology: You can never be sure what you’ll find, but it’s often surprising.
So it goes with the LAHF and SAHF instructions, which AMD originally dropped from
the 64-bit AMD64 architecture, then restored after discovering some software still
needs them. (See MPR 7/19/04-01, “A Tale of Two Instructions.”)
On the basis of our initial research, we reported that Intel first introduced
LAHF and SAHF in the 16-bit 286 processor of 1982, mainly to speed up context
switching for operating systems. (SAHF saves five x86 condition flags into the
AH register, and LAHF restores the flags from that register.) Engineers from AMD
and Intel reviewed an early draft of our article for technical accuracy and didn’t
notice anything amiss. Likewise our internal reviewers, including the x86 experts
on our analyst staff and editorial board. We published the article with confidence.
So imagine our surprise when a sharp-eyed reader from Germany took issue with
our version of the historical record. Dr. Reinhard Kirchner, a computer science
lecturer at the University of Kaiserslautern, sent an email message saying that
LAHF and SAHF have been part of the x86 architecture from the very beginning—all
the way back to the Intel 8086 processor of 1978. Furthermore, Kirchner wrote,
the two instructions were not intended for context switching. Instead, Intel included
them to make it easier for programmers to port software to the x86 from the even
earlier 8080 processor.
Microprocessor Report readers can access the full story (2 pages) here:
www.mdronline.com/mpr/h/2004/0907/183602.html. To find out more about Microprocessor
Report, please visit: www.mdronline.com.
Most Recent Processor Watch Articles
Past Processor Watch Articles
|