Wednesday, January 24, 2007

Multicores not as productive as you expected?


For a while, I have been intrigued by how small the performance pop is from multicore processor designs. I've written about this and, finally, I think I can begin to quantify it. I'll use AMD's processors for the sole reason that the company has for years posted a performance measure for each of its processors. (Originally, this data was a move to counter Intel's now-abandoned fascination with high clock speeds.)

This chart shows the performance as published by AMD and the corresponding clock speeds for most of its recent processors. I have broken the figures out for the three branches in the Athlon processor family (which is AMD's desktop chip).

There are several interesting aspects to this chart, but the one I want to focus on is the performance of the rightmost entry. The dual-core Athon 64 X2 with a rating of 5200 has a clock speed of 2600MHz. Now, notice the Athlon XP with a rating of 2600 (10th entry from the left): it has a clock speed of 2133 MHz.

In theory, since the AMD ratings are linear, a dual-core processor should give you near but not quite the performance of two single-core chips. So two 2600-rated chips should give you roughly the performance of a 5200-rated dual core chip. Using the chart, we would expect two 2.133GHz cores to give us the 5200 performance figure. In reality, though, it takes two 2.6GHz cores to do this--far more than we would expect. It's actually an even wider gap that that, because the dual-core chips have faster buses and larger caches than the Athlon processors we're comparing it to, so it can make far better use of the processor on each clock cycle.

So, why does it take so much more than twice the clock speed to deliver dual-core performance? The orignial 2600-rated Athlon XP had a memory manager built into the chip. On the X2 chip, however, the two cores do not have dedicated memory managers--instead, they share a single on-chip memory controller. This adds overhead. The cores also share interfaces to the rest of the system and so again must work through resource contention to get the attention they need.

Don't be fooled into thinking this is an AMD-specific issue. (As I said earlier, I used AMD only because they are kind enough to publish clock speed and performance data for their chips.) This is not an AMD-only problem. Intel is in exactly the same boat--what is shared between cores is plenty expensive. Expect, as time passes, to see chip vendors trying to limit these shared resources.

No comments: