Client Login
Search
MDR Home
Vol 13, Issue 3
March 8, 1999

What's the Best Way to Benchmark?

As EEMBC Wrestles With Testing Conditions, Philosophical Issues Arise

A benchmark is like sex. Everybody wants it, everybody is sure of how to do it, but nobody can agree on how to compare performance.

Part of the problem lies in the fact that microprocessor performance is not a one-dimensional vector. Microprocessor drag racing is all very nice, but the average embedded designer is looking to balance the often-contradictory demands of power consumption, performance, code density, price, interrupt response, and probably other factors. A combination that's good for one application may be unusable for another.

Benchmarking embedded chips is tough, no doubt about it. That's why we have not progressed beyond Dhrystone, the accepted lowest common denominator that any microprocessor can run. Unfortunately, Dhrystone tells us very little about what a microprocessor is good at. One could argue that Dhrystone scores say more about the marketing efforts behind a chip than about its technical features.

By now you've probably heard about the embedded benchmarking work under way at EEMBC (see MPR 4/20/98, p. 13). EEMBC's laudable goal is to eradicate the scourge of Dhrystone in our lifetime. EEMBC (www.eembc.org) counts 24 CPU makers, large and small, among its members. For such a diverse group, they've made amazing progress toward standardizing embedded benchmarks. But there may still be some crumbs between the sheets.

Realizing that no single metric can hope to capture the many varied aspects of a chip's performance envelope, the EEMBC benchmark suite consists of dozens of smaller benchmarks. Each test contains a core algorithm taken from real-world code. There are tests for automotive-engine control, codecs, pixel manipulation, task switching, and lots of others. All the tests have been written in ANSI C for architecturally neutral portability.

EEMBC is following a path somewhat similar to that taken by SPEC (www.specbench.org), which is a good thing in my opinion. Specifically, EEMBC will allow its members to report two scores for every benchmark: the "out-of-the-box" score and the flat-out, fully tweaked, downhill-with-a-tailwind score. The two scores allow potential users of these chips both to evaluate competing processors under controlled conditions (the basic scores) and to see what each chip is fully capable of, given some care and attention.

Nobody disputes the need for controlled, nonnegotiable, standardized testing. But I expect some controversy over how best to handle the "tweaked" scores. Exactly how much tweaking is allowed, or desirable? Should testers be allowed to alter the source code of the benchmark? Can they rewrite key algorithms? Can they take shortcuts, like hard-coding lookup tables or--dare we say it--the predetermined results of complex calculations?

The question boils down to deciding what is important to test and what is extraneous. The Heisenberg Uncertainty Principle suggests that the less you want to know, the more accurately you can know it. If your goal is to pin down a given microprocessor's abilities in real-world situations, make sure that's what you're measuring. I believe there should be (almost) no holds barred. Any optimization, from rewriting all the C code, to creating shortcuts, to using unusual chip-specific features or instructions, is fair game in my book. This approach encourages creative and unusual solutions, which are representative of the real world of creative and unusual embedded programmers. As long as the benchmark delivers the correct answers in a reliable and repeatable manner, the details of generating the results shouldn't matter.

It's that "reliable and repeatable" part that makes people nervous. Obviously, simply hard-coding the answers to the benchmark after a few NOPs isn't meaningful. And here's where EEMBC's sister organization, the EEMBC Certification Labs (ECL; www.embedded-benchmarks.com), comes in. ECL must first approve every EEMBC member's benchmark scores before those scores can be published. "Approval" in this case means duplicating the same scores in ECL's own facilities. Part of ECL's role is to prove that "tweaked" scores aren't arrived at by nefarious means. That proof includes pumping alternative data sets into the chip under test to be sure that it's really executing the correct algorithm and not just regurgitating prearranged answers.

To me, it seems that a benchmark should test the abilities of a chip, not the skill of EEMBC's programmers. Wide-open testing promotes creativity and allows vendors to exploit the unusual features of their processors. As long as the chip returns the correct result under all conditions, I don't believe that what's inside the black box matters.

Forcing a particular coding convention onto dozens of different microprocessors only discourages programmer innovation and reduces everything to the lowest common denominator. And then we'd be right back to Dhrystone.

Editorial by Jim Turley
jturley@mdr.cahners.com
Most Recent Editorials

Privacy Statement Site Index Help Contact Us Subscribe
Copyright © 2000 MicroDesign Resources