|
Vol
20, Issue 9
|
 |
February 27, 2006
|
By Kevin Krewell
I don’t know how many times in the past I’ve been asked when
Sun will give up on the SPARC processor line. It’s been a very common question
in the past, and my answer is that I hope Sun (and Fujitsu) never gives
up on SPARC development. It’s not that the SPARC architecture is somehow
inherently superior to the other remaining RISC server architecture, POWER, or
to the venerable (read, old) x86 instruction set, or to that of the upstart EPIC
(Itanium). The time is long past when a particular processor instruction set or
the original RISC microarchitecture design aspects, such as register windowing and
delay slots, represents a significant differentiator in servers. Some design
aspects, such as a large number of general-purpose registers available or SIMD
extensions like AltiVEC, can give an architecture certain advantages, but it
has never proved to be a decisive advantage. Even when an instruction-set
architecture dies, the innovative design ideas can often live on in another
processor (much as some Alpha system concepts now survive in AMD’s Hammer
system architecture). And for all the supposed advantages EPIC has by being the
newest architecture, it hasn’t come close to meeting the lofty expectations
originally set for it.
It’s more that I believe there should be no shortage of new
ideas and new designs. By having control over both the hardware (SPARC) and the
software (Solaris), Sun can push interesting designs ideas into systems like
Cool Threading and the Niagara (UltraSPARC T1) processor.
At the recent Sun Analyst Summit in San Francisco, I had the
opportunity to talk with Dr. Marc Trembley, lead architect of Sun’s next big
server processor, code-named Rock. Although Trembley wouldn’t reveal the
secrets of Sun’s next-generation high-end server processor, it was clear that
there are enough new ideas in Rock to push server performance in new ways.
While Intel throws more and more transistors at the Itanium
and Xeon processors, with ever more incredible cache sizes, and IBM resurrects
the clock frequency race with the POWER6, Sun is looking at improving core
execution efficiency but innovating around concepts of threading and keeping
the core filled with techniques, such as by scouting ahead of the execution
path and prefilling the caches and execution pipelines.
No End to Ever More Threads?
Sun’s future Niagara 2 processor is going from 32
threads to 64 and will be able to support a two-CPU coherent system with 128
total threads; this performance will be a challenge for an operating system to
manage. The question on many designers’ minds is this: When does thread scaling
slow and reach the point of diminishing returns? A very important part of
next-generation multicore microprocessor design will be the internal and
external bandwidth requirements necessary to support all these threads. The
goal to increase core efficiency will put new stress onto bandwidth, even if it
can withstand greater latencies.
There will still be issues with locks and synchronization
points that can limit the scaling of any one application. Running multiple
instances of the application may help, but it is generally less efficient than
having the application manage many threads. Server virtualization could support
many simultaneous instances of virtual server environments on one machine, but
that can still be an administrative nightmare with potentially tens or hundreds
of concurrent operating systems to manage.
While Sun is embracing the multithreading model, IBM, the
company that shipped the first dual-core multithreaded server processor, has
decided not to follow the same path. Rather, it looks as if IBM has decided to
stick to only two cores for the POWER6 but has tightened up the design to be
able to run at faster clock speeds. The POWER6 is now looking to be in the
4–5GHz clock-speed range, and we have to wonder whether IBM is signaling that
the pendulum is now swinging back to faster processors, just as we’re becoming
focused on more cores and threads per CPU.
When given a clean slate to start from (with no installed
software base), as it had with the Cell Xbox 360 processors, IBM chose to build
a core with less complexity and fast execution. While the POWER6 processor
doesn’t have the same luxury and needs high performance executing legacy code,
indications are that IBM has reduced the core complexity from the POWER4/5 to
the POWER6. The Power4/5 microarchitecture is an eight-issue core, whereas we
hear persistent rumors that the Power6 is a four-issue core. The decrease in
peak instructions per cycles for the Power6 core will be more than offset by
roughly doubling the clock speed, which will increase sustained performance
assuming the system bandwidths have been increased sufficiently.
With AMD and Intel locked in a race to ship quad-core
versions of familiar processor architectures with relatively modest clock
speeds (compared with the latest from IBM), the really interesting design action
appears to be moving back to the big-iron guys.
|