Purchase Microprocessor Report Articles Online Weekly collections of Microprocessor Report articles are now available for purchase and download online. Price $50. Click Here to Purchase
One look at today’s newspapers and you’ll find that, contrary to what some people tell you, real-estate prices don’t always go up. They fluctuate. They can also go down. They depend on too many parameters to enumerate, among them the size of the area under consideration.
Luckily, the size of your home may grow larger with additions, but never gets smaller. But imagine how worried you would be if by some mysterious magic your house, its scale still unchanged from your viewpoint, would begin to shrink, decreasing in value. And in the area your home once occupied, new buildings begin rising up from the ground. Some of them are beautiful, a good fit for the neighborhood, while others, built quickly, are cheap uninspired architectures.
Silicon real-estate mostly goes down. Imagine how computer architects and chip vendors feel about valuing their brainchild by the die area it occupies. To top it off, consider the less wonderful aspect of advanced semiconductor processes and efficient fabrication. The price of the complete die will eventually decrease, the relative silicon area will be further reduced, and so will the percentage of intellectual property (IP) revenue that providers and chip makers will obtain for their designs. As if that is not enough, IP values are being further eroded by the upsurge in competition coming from new and established IP providers.
In self defense, business and engineering people have devised ingenious ways to preserve their revenue—perhaps even increase it. Most of these strategies have involved selling more intellectual property (IP) or a more complete package of intellectual property and applications software.
In categorizing a few of the more obvious means to license more silicon real-estate, we find added functions, efficient speed increases, hardware bundling, and bundling of hardware and software together to create platforms.
Adding functions is one of the most common strategies employed by owners of established instruction-set architectures (ISAs). The strategy preserves the investment in software development tools and helps keep most of the owner’s existing customers. Processors continue to run legacy software. A general-purpose processor (GPP) ISA owner may add instruction extensions or a DSP engine to the existing GPP architecture. Conversely, for owners of DSP IP, the addition of appropriate instructions will add GPP capabilities. Often, the existing memory hierarchy and the internal configuration of the processor will not provide optimum performance for the new functions—in the same way that adding a room to one’s home may not result in an optimal integration of the room into the original structure. But by adding the room, the owner occupies more real-estate preventing a competing builder from using it. ISA-additional functions are best designed in-house or obtained via acquisitions (graphics: AMD, ARM). Licensed additions don’t become part of the ISA (graphics: OMAP2-3 by TI, XScale by Intel, and now, Marvell).
Assuming that the very best logic and physical designs have already been incorporated to reduce power consumption, multi- or parallel processing can help reduce the clock frequency, the voltage required for driving it, and the resulting consumption of power. Multiple processors or processing stages require a larger portion of the die, helping efficient performance while increasing the value of contributed silicon to the whole chip. The use of multiple cores or pipe stages has been embraced by all ISA vendors. Owners of established ISAs can keep their software base while thinking about how to best use multiple processors. Newcomer ISA owners, too numerous to mention, find a level playing field for architectures optimized to work in parallel. Do-it-yourself parallelism also belongs to this category. Designers can choose among different offerings from companies like Altera, ARC, ARM (OptimoDE), M2000, MIPS Technologies, Stretch, Tensilica, and Xilinx.
Going from the purely technical methods of increasing revenue to strategies that mix technology and business, we find companies that offer hardware bundles. Bundles include the core processor plus one or all of configurable caches, memory management, coprocessor or slave processor DSPs, accelerators, and peripherals. Some bundles are created through alliances with implementation technologies. Examples are eASIC’s recent announcement offering Tensilica’s Diamond cores at no extra cost to designers during design or mass production, or MIPS’ acquisition of Chipidea, or ARM’s processor cores optimized for FPGAs.
Moving to enrich their offering even more via business alliances, IP vendors have been offering bundles of hardware and software optimized for specific workloads. These are the platforms, the most popular of which are targeting audio and video applications. Two of the companies joining the competition are ARC International offering a line of products from its ARC Video Subsystem family, including codecs, and Tensilica with its Diamond Standard 388VDO Video Engine—a preconfigured video IP core consisting of two interconnected Tensilica Xtensa LX processor cores. Practically all the major embedded IP and chip providers have introduced platforms at different levels of completeness. They aim to obtain as much revenue as possible from the combination of software codecs and the increased silicon die area required to execute the workload.
Having described a few ways in which more IP can be offered to increase a vendor’s share of die area, we note that IP providers can’t increase the required real-estate without good reason just as independent software vendors can’t license inefficient code. SoC designers will insist on licensing the smallest processor cores available, but will pay for additional, minimally-sized cores, accelerators, and peripherals, knowing that the end-user will pay for better performance and more functions.
SoC functions however are not so easy to define. Frequently, the SoC designer doesn’t learn about the end-user’s preferences directly from the end-users. Chip (ASIC) designers will often create configurations dictated by important customers whose marketing teams are closer to the end user—the consumer, for instance. The advantage to the semiconductor vendor of providing all or most of the silicon real-estate is offset by the risk of being disconnected from consumers and their rapidly changing likes and dislikes.
Risk levels will vary. An OEM creating digital cameras can research its potential buyers to help make accurate predictions of what will sell near-term. The SoC specifications communicated to the ASIC designer will carry less risk. Not so, however, with cellphones. The service provider is closest to the consumer, the OEM is next and the chip designer/semiconductor vendor is in third place on the totem-pole.
Operating from this position of higher risk, the chip designer is challenged to balance the cost and efficiency provided by accelerators and narrow-application engines, against the flexibility afforded by high-performance fully programmable chips that can be quickly programmed to provide consumers with iPhone look-alikes or any other fad that may arise.
This leaves IP companies in the even riskier fourth place on the totem-pole, but working diligently to grow their figurative house at a faster pace than technology and competition can shrink it.
Receiving indirect information about the end-user’s needs, to ensure success, many IP providers will need to follow multi-pronged strategies. Several pre-configured platforms—ranging from low-cost application-specific to highly programmable high-performance, general-purpose—will be useful but not enough. Pre-configured platforms are less flexible. Some, even the highly programmable ones, may not be a good fit for the rapidly changing customer scenario. Additional strategies can make available rich libraries of separate IP components to allow the SoC designer to configure the chip—and take the risk. Finally, to better ensure long term success, IP providers are expanding their presence in technologies such as FPGAs, structured ASICs, and microcontrollers.
To find out more about Microprocessor Report, please visit: www.mdronline.com
Parallel Processing For the x86
Tom R. Halfhill - Senior Editor
{11/26/2007}
RapidMind Ports Its Multicore Development Platform to x86 CPUs
The Holy Grail in computer science is a high-level compiler that automatically extracts hidden parallelism from existing source code and efficiently distributes the workloads on the latest multicore processors. Ideally, programmers need not rewrite any code, and the compiler transparently targets microprocessors with any number of cores.
Dream on. Conventional serial code doesn’t surrender its hidden parallelism (if, indeed, any exists) without a fight. True, a good vectorizing compiler can find some small-scale data parallelism, assuming the processor has vector-math instructions. Optimizing compilers can find some instruction-level parallelism when targeting processors with superscalar dispatching and other fancy features. And microprocessors with dynamic branch prediction, speculative execution, and out-of-order execution can find a little more parallelism at run time. But none of these techniques fully exploits the rapidly expanding resources of the latest multicore designs.
Due to those limitations, programmers must rewrite at least some of their source code to explicitly expose parallelism to the compiler or the processor. It’s not the ideal solution. But for now—and possibly forever—it’s the best way to keep multiple processors busy.
RapidMind is one of several parties entering the market for parallel processing. Founded in 2004, RapidMind is a privately funded company based in Ontario, Canada. The RapidMind Multicore Development Platform does require programmers to rewrite the data-intensive portions of their code, and it also requires the target system to run a hardware-abstraction layer between the application program and the microprocessor. In return for those compromises, RapidMind claims big benefits. Some tasks run five to ten times faster, and, in some cases, performance can scale faster than the rising number of processors. In addition, the parallel code is highly portable—programmers needn’t rewrite it for each new multicore processor or multiprocessor system.
Previously, RapidMind’s platform worked only with IBM’s Cell Broadband Engine (Cell BE) and the graphics processors from AMD/ATI and Nvidia. On November 5, RapidMind announced Multicore Development Platform v3.0, which targets the popular multicore x86 processors from AMD and Intel—a big step. This move opens up new opportunities for RapidMind in general-purpose computing, such as desktop publishing. Until now, RapidMind focused mainly on high-performance computing: financial modeling, image processing, data mining, scientific analysis, simulations, broadcast-quality multimedia generation, 3D visualization, transactional databases, and so forth.
Mainstream PC software has comparatively little inherent parallelism, so RapidMind’s platform is less useful for general productivity applications. But RapidMind’s embrace of the x86 is a boon for heavy-duty number crunching on commodity hardware.
Graphics with this article: Figure 1. RapidMind’s Multicore Development Platform v3.0. Figure 2. Parallel processing with RapidMind’s platform. Figure 3. Comparison of C++ code before and after rewriting for RapidMind’s Multicore Development Platform. Figure 4. Performance comparison of serial C++ code vs. RapidMind C++ code. Figure 5. Performance comparison of serial C++ code vs. RapidMind C++ code on x86 processors and an Nvidia GeForce 8800 GTX graphics card. Sidebar: “RapidMind Wins HPCwire Awards at SC07 Conference”