| Publications & Services | Events | Watch Newsletters | Microprocessor Report | Press | Sales | About Us | Home | InStat.com |
Vol 18, Issue 39
September 27, 2004

Can Benchmarking Be Rational?

By Kevin Krewell


Kevin Krewell

Before I get to the point of this editorial, I want to give one last plug for Fall Processor Forum. As readers of Microprocessor Report, you benefit from the forums with detailed stories on new announcements. This year we have two sessions that particularly stand out—the high-performance embedded processors session and the DSP session. Both are loaded with state-of-the-art processors from key industry vendors. We hope you will also attend the event where you can meet the presenters and question them.

We recently added a new panel discussion on Wednesday afternoon that I am particularly looking forward to. The topic is embedded x86 benchmarking, and we have a panel of high-profile industry executives to discuss the opportunities and pitfalls of measuring microprocessor performance. The panel was inspired by a controversy during the writing of Tom Halfhill's recent benchmarking story. (See MPR 8/30/04-01, "Benchmarking the Benchmarks.") "Benchmark" and "controversy" are two words that frequently appear together in the same sentence. We believe that benchmarking controversies (there they are again) are often the result of companies' marketing department manipulation—a practice that wags are fond of calling "benchmarketing." But even assumedly rational engineers can disagree over appropriate benchmarks and benchmarking techniques.

The specific quagmire Halfhill stepped into concerned the embedded x86 benchmarking, but the same issues apply to most other processor markets as well. In this particular case, an independent company called Synchromesh Computing had developed, in conjunction with processor vendor AMD, the Embedded Processor Rating System (EPRS). Synchromesh Computing was founded by, and is owned by, Alan Weiss, who also heads the EEMBC Certification Labs (ECL). And, in the interest of full disclosure, we should note that Alan and EEMBC president Markus Levy are members of the Microprocessor Report editorial board. Without any outside review, AMD's Geode marketing team used the EPRS to rationalize a processor numbering scheme that positioned various Geode processors against various VIA C3 processors. We believe AMD used the benchmarks as a blatant market-positioning tool.

AMD's argument was that using frequency to compare Geode processors with VIA processors wasn't valid. True enough. The soundness of this argument has increased over time, when Keith Diefendorff wrote the editorial "Benchmarks are Bunk" (MPR 6/26/00-01) in 2000, proclaiming that PC benchmarks were less useful in determining processor performance than core clock frequency was. Some processor microarchitectures (i.e., Intel's Pentium 4 NetBurst architecture), with wildly exaggerated pipeline lengths, have proved that clock frequency can be just as misleading. (The 2.0GHz Pentium M has a better SPECint2000_base score than the 3.6GHz Pentium 4!) We agree with the sentiment with regard to EPRS, but we have trouble with the execution.

Tom's story covered the issues but couldn't finish the process. We believed there was a need to get the interested parties together, in a neutral venue, to establish a dialog. And so we added the panel to the FPF program. The discussion is open to embedded x86 benchmarking as a whole, not just the AMD–VIA controversy. We expect a lively discussion and believe this is another step in the evolution of microprocessor benchmarking.

What Makes Benchmarking So Tough?

What makes benchmarking so tough is partly the constant arguments over applications-based benchmarks versus synthetic benchmarks. Applications-based benchmarks will provide an indication of the way a processor and its platform will perform under specific (often unrealistic) conditions. Of course, if that application is exactly the one you need to run, then the benchmark can give you some guidance. Often the application is bundled with other applications and the scores are then rolled into a single figure of overall performance measurement. When this happens, you need to evaluate the applicability of the application suite and the relative weighting of each component application. Often, it's impossible to isolate specific components from the overall number.

What applications-based benchmarks won't tell you is how the processor or platform performs in specific areas of interest, like memory bandwidth or floating-point performance, or on customer-specific code. Furthermore, application programs are not always available for cross-platform testing. Synthetic benchmarks suffer from problems such as maintenance costs; compiler and silicon optimization "tricks"; and disagreements over code choices and relevance.

The worst example of synthetic benchmark abuse is Dhrystone MIPS. Silicon and compiler optimizations of Dhyrstone MIPS have rendered it virtually useless, and no one has made the effort to update it. In addition, Dhrystone was written in the days before on-die caches, so its small size makes it an incredibly poor measure of system performance.

Another area of contention is the process of creating and validating benchmarks. The simplest model is to use a code segment or binary that can run and produce some figure of merit (e.g. Dhrystone). If an organization continues to maintain the code and enhance it over time to combat obsolescence and optimizations, the benchmark can prove useful (although at the cost of historic comparisons). But maintaining the code takes time and people. Funding this development can be borne by a consortium (such as EEMBC or SPEC) or by a private company (like Futuremark or Synchromesh Computing). Both models have advantages and disadvantages. Consortiums can, in theory, be more representative of industry opinion of performance. They can also be very slow in developing new benchmarks, because the process of achieving compromises between competing companies can take time and patience.

One problem I have had with one particular EEMBC bylaw is that members are prohibited from publishing benchmarks of other members' processors. This restriction gives companies an incentive to join the "club," even if they have no plans to use the benchmark—membership prevents unwanted comparison. Adding new members may help fill its coffers, but it doesn't necessarily promote the EEMBC product.

The problem with private companies is that they are not beholden to anyone and can therefore proceed with a benchmarking process that doesn't represent industry opinion. Private companies can also be influenced by the aggressive participation of one or a very few companies. (And, to be fair, so can consortiums.)

So, do we give up? Do we throw up our hands (literally and figuratively) in frustration? No. I believe the answer is greater participation by the engineering community and by customers. I also believe we need more open dialog on benchmarking in general. I hope our FPF panel will become part of a semiregular feature. At Spring Processor Forum, in May '05, I'd like to extend the discussion to PC and notebook processors and to the various model and processor numbering systems. But that's a quagmire for another day.

KevinKrewellSig

Most Recent Editorials

 
  | Publications & Services | Events | Watch Newsletters | Microprocessor Report | Press | Sales | About Us | Home | InStat.com |

In-Stat/MDR Locations
Massachusetts
275 Washington Street
Newton, MA 02458
Phone: 617.630.3900
Arizona
6909 East Greenway Parkway,
Suite 250
Scottsdale, AZ 85254
Phone: 480.483.4440
California
1101 S. Winchester Blvd.,
Bldg. N,
San Jose, CA 95128
Phone: 408.243.8838

Copyright Š 2003 In-Stat/MDR
A Unit of Reed Business Information, A Division of Reed Elsevier, Inc.
Read our Privacy Statement. More Research CARR Reports.