|
Vol
18, Issue 39
|
 |
September 27, 2004
|
By Kevin Krewell
Before I get to the point of this editorial, I want to give
one last plug for Fall Processor Forum. As readers
of Microprocessor Report, you benefit from the forums
with detailed stories on new announcements. This year we have
two sessions that particularly stand outthe high-performance
embedded processors session and the DSP session. Both are
loaded with state-of-the-art processors from key industry
vendors. We hope you will also attend the event where you
can meet the presenters and question them.
We recently added a new panel discussion on Wednesday afternoon
that I am particularly looking forward to. The topic is embedded
x86 benchmarking, and we have a panel of high-profile industry
executives to discuss the opportunities and pitfalls of measuring
microprocessor performance. The panel was inspired by a controversy
during the writing of Tom Halfhill's recent benchmarking story.
(See MPR 8/30/04-01,
"Benchmarking the Benchmarks.") "Benchmark" and "controversy"
are two words that frequently appear together in the same
sentence. We believe that benchmarking controversies (there
they are again) are often the result of companies' marketing
department manipulationa practice that wags are fond
of calling "benchmarketing." But even assumedly rational engineers
can disagree over appropriate benchmarks and benchmarking
techniques.
The specific quagmire Halfhill stepped into concerned the
embedded x86 benchmarking, but the same issues apply to most
other processor markets as well. In this particular case,
an independent company called Synchromesh Computing had developed,
in conjunction with processor vendor AMD, the Embedded Processor
Rating System (EPRS). Synchromesh Computing was founded by,
and is owned by, Alan Weiss, who also heads the EEMBC Certification
Labs (ECL). And, in the interest of full disclosure, we should
note that Alan and EEMBC president Markus Levy are members
of the Microprocessor Report editorial board. Without
any outside review, AMD's Geode marketing team used the EPRS
to rationalize a processor numbering scheme that positioned
various Geode processors against various VIA C3 processors.
We believe AMD used the benchmarks as a blatant market-positioning
tool.
AMD's argument was that using frequency to compare Geode
processors with VIA processors wasn't valid. True enough.
The soundness of this argument has increased over time, when
Keith Diefendorff wrote the editorial "Benchmarks are Bunk"
(MPR 6/26/00-01)
in 2000, proclaiming that PC benchmarks were less useful in
determining processor performance than core clock frequency
was. Some processor microarchitectures (i.e., Intel's Pentium
4 NetBurst architecture), with wildly exaggerated pipeline
lengths, have proved that clock frequency can be just as misleading.
(The 2.0GHz Pentium M has a better SPECint2000_base score
than the 3.6GHz Pentium 4!) We agree with the sentiment with
regard to EPRS, but we have trouble with the execution.
Tom's story covered the issues but couldn't finish the process.
We believed there was a need to get the interested parties
together, in a neutral venue, to establish a dialog. And so
we added the panel to the FPF program. The discussion is open
to embedded x86 benchmarking as a whole, not just the AMD–VIA
controversy. We expect a lively discussion and believe this
is another step in the evolution of microprocessor benchmarking.
What Makes Benchmarking So Tough?
What makes benchmarking so tough is partly the constant
arguments over applications-based benchmarks versus synthetic
benchmarks. Applications-based benchmarks will provide an
indication of the way a processor and its platform will perform
under specific (often unrealistic) conditions. Of course,
if that application is exactly the one you need to run, then
the benchmark can give you some guidance. Often the application
is bundled with other applications and the scores are then
rolled into a single figure of overall performance measurement.
When this happens, you need to evaluate the applicability
of the application suite and the relative weighting of each
component application. Often, it's impossible to isolate specific
components from the overall number.
What applications-based benchmarks won't tell you is how
the processor or platform performs in specific areas of interest,
like memory bandwidth or floating-point performance, or on
customer-specific code. Furthermore, application programs
are not always available for cross-platform testing. Synthetic
benchmarks suffer from problems such as maintenance costs;
compiler and silicon optimization "tricks"; and disagreements
over code choices and relevance.
The worst example of synthetic benchmark abuse is Dhrystone
MIPS. Silicon and compiler optimizations of Dhyrstone MIPS
have rendered it virtually useless, and no one has made the
effort to update it. In addition, Dhrystone was written in
the days before on-die caches, so its small size makes it
an incredibly poor measure of system performance.
Another area of contention is the process of creating and
validating benchmarks. The simplest model is to use a code
segment or binary that can run and produce some figure of
merit (e.g. Dhrystone). If an organization continues to maintain
the code and enhance it over time to combat obsolescence and
optimizations, the benchmark can prove useful (although at
the cost of historic comparisons). But maintaining the code
takes time and people. Funding this development can be borne
by a consortium (such as EEMBC or SPEC) or by a private company
(like Futuremark or Synchromesh Computing). Both models have
advantages and disadvantages. Consortiums can, in theory,
be more representative of industry opinion of performance.
They can also be very slow in developing new benchmarks, because
the process of achieving compromises between competing companies
can take time and patience.
One problem I have had with one particular EEMBC bylaw is
that members are prohibited from publishing benchmarks of
other members' processors. This restriction gives companies
an incentive to join the "club," even if they have no plans
to use the benchmarkmembership prevents unwanted comparison.
Adding new members may help fill its coffers, but it doesn't
necessarily promote the EEMBC product.
The problem with private companies is that they are not
beholden to anyone and can therefore proceed with a benchmarking
process that doesn't represent industry opinion. Private companies
can also be influenced by the aggressive participation of
one or a very few companies. (And, to be fair, so can consortiums.)
So, do we give up? Do we throw up our hands (literally and
figuratively) in frustration? No. I believe the answer is
greater participation by the engineering community and by
customers. I also believe we need more open dialog on benchmarking
in general. I hope our FPF panel will become part of a semiregular
feature. At Spring Processor Forum, in May '05, I'd
like to extend the discussion to PC and notebook processors
and to the various model and processor numbering systems.
But that's a quagmire for another day.
|