Tradeoffs at the low end: Cores or Cache?

I’m looking at building myself a new PC for the first time in years. That’s a little bit of a misnomer though. Today, building a PC can mean bolting as few as two components into a case and connecting four cables. Building PCs in the 1990s was a lot more difficult. I remember in 1994, during one of my first builds, someone walking past in the hall, looking at the mess of cards and cables, and asking, “How do you know which one goes where?”

Today, the assembly is pretty easy. Figuring out what to buy is harder. In 1994, the differences between the various flavors of 386 and 486 chips available was confusing, but it all fit on an index card. Mainly the difference came down to the amount of memory the chip could address (386) and whether it had a math coprocessor (486). Beyond that all you really had to worry about was clock speed. Back then the research took 30 minutes and the system took hours to build.

Today there are two chip manufacturers (down from four) but they both have half a dozen product lines. And nobody really talks about clock speed anymore. That’s fine because clock speed was a crude measure of performance, but is throwing numbers like 560 or 840 or 965 on the chips really any better? Today the research takes hours (if not days) and the system goes together in about 5 minutes. Shake the bag right and it could just come out of the bag fully assembled.

I’m really only looking at two chips: the AMD Phenom II x2 560 and the AMD Phenom II x4 840. I can get either chip bundled with a decent motherboard for under $99. Both are intended as lower mid-range CPUs. Better than a Celeron, but not intended to satisfy people who want the highest gaming benchmarks or people who are trying to run high-end scientific or engineering applications.

When you choose between the two, you’re trading cores for cache. The 560 has 6 MB of L3 cache and two CPU cores, while the 840 has four cores and no cache. The 560 also runs 100 MHz faster, but in the 3 GHz neighborhood, a difference of 100 MHz is going to be tough to notice.

Last summer I spent a couple of days with an AMD 640, which despite the name is basically the same CPU as the 840 but clocked 200 MHz slower, at 3 GHz. Using it didn’t make me feel deprived. Not in the least.

Reading the user reviews at the usual places, it’s hard to find anyone with a legitimate complaint about either the 560 or 840. You find a small number of DOAs and of course those people will be upset, but that’s a chance you take with any chip. You find some complaints with the bundled heatsink/fan, but you expect that too. Beyond that, the complaints you find are about the name being confusing or not being able to unlock cores or overclock.

I dismiss the latter case. If you wanted a $150 CPU, don’t buy a $100 CPU and gripe when that’s all you get. Sometimes the cores are disabled because they weren’t usable. It’s better than throwing the chips away.

I asked around a bit, and Gatermann pointed me to a benchmark  that tried to isolate the difference L3 makes. It found the difference was anywhere from 5-20%. And for the kind of work that I would want the higher performance, like audio and video encoding, there’s not enough difference to be noticeable. Theoretically, at clock rates over 3 GHz the difference is larger. But still, not a lot.

Unfortunately I wasn’t able to find a similar benchmark that showed two otherwise identical AMD CPUs with differing numbers of cores for comparison.You’d think someone would have done it–all they have to do is benchmark a Phenom II x2 500-series chip and its equivalent Phenom II x2 900-series chip. Based on the benchmarks I could find, sometimes the difference between a 2-core and a 4-core was nearly 100 percent and sometimes it was more on the order of 25-30 percent, but frequently it could also be less than 10 percent. Part of the difficulty is finding software that takes advantage of the extra cores, and there’s still a lot of software out there that doesn’t. The more heavily you multitask, the more benefit you’re going to see.

I suppose if there was a big difference in performance and it was easy to demonstrate it, there would be a comparable difference in price. There’s not a big difference, so that’s why a 2-core CPU with L3 cache sells for $90 and a 4-core CPU without L3 cache sells for $100.

This is pure speculation on my part, but I would think in the future, there will be more software taking advantage of the extra cores. So the system with more cores might be a little bit more future-proof. But I don’t think I’m going to fret over the difference. I’m planning to make a purchase this weekend. Thanks to back-to-school promotions, prices will be a bit lower, but selection will be a bit more limited. I’ll buy what I can get, because either one will be a big improvement over what I have now.

Judging from the always-limited selection of the Phenom II x4 965, which has both L3 cache and 4 cores for about $40 more, it seems a lot of people are willing to pay a little extra to get both and hedge their bets. You’re paying roughly a 40% premium for perhaps 30% greater performance–it has a higher clock rate too–which goes against my general tendencies, but I can understand the logic. If you don’t intend to upgrade CPUs later and can stretch another 3-6 months out of the system, you might get $40 in additional value out of it.

But for the time being, I’m content to hang out in the low end.

One thought on “Tradeoffs at the low end: Cores or Cache?

  • August 7, 2011 at 8:33 pm

    “And for the kind of work that I would want the higher performance, like audio and video encoding…”

    Short answer: Quad Core.

    With caveats of course but video encoding tips the scale solidly in favor of cores over cache and typically clock speed as well. IF (here comes the ifs) the encoder can take advantage of more cores. Some do and some don’t. Using Handbrake, h.2.6.4 encodes fully support multiple cores for example, while the same program encoding DivX does not. In the latter case there is still benefit to having a core to itself while audio tracks and subtitles might be handled in separate threads, but my experience with Handbrake and h.2.6.4 to date is clear; the more cores the better with four a near mandatory minimum to even experiment with HD.

    A side note is the shame of it all given power consumed by general purpose processors when encoding video given abilities of dedicated silicon doing the job faster in a handful of watts but for patent encumbrances in conjunction with ever evolving platforms and a relatively small market for such specific functionality. In lieu of this we will increasingly leverage programmably reconfigurable graphic processors (CUDA) thus obviating the need for higher core counts on that basis alone. That said, I’ve not idea at the moment what the power budget will be given that high end GPU’s can eat far more energy than CPU’s. The question thus revolves around how much GPU is needed in dollars and watts. In contrast to all this, my ultra-compact digital camera can record high quality h.2.6.4 video with aac audio on the fly until its matchbook size battery runs out (about and hour).

    Ok, back to the here and now.

    Toms Hardware did a comparison between two Athlon II’s, one with L3 cache and one without. At the end of the day found the addition of L3 added 8.8% performance improvement on average. On a few tests the L3 was a slight (2%) hindrance or no improvement with benefits as high as 20% in one specific scenario. Alternately Passmark scores suggest approximately 16% improvement within the family.

    To my thinking this is all in the margins of lesser concern given differences in compiler optimizations that often obviate it all even when it might. For example my current machine in the main is an old 2.8Ghz Athlon 64 dual core with 2x1Mb L2 built at 90nm with a Passmark score of 1541. Compared to a 3.3Ghz Phenom X2 with 2×512 Kb L2 + 6Mb L3 at 45nm, with a Passmark score of 2054 the difference isn’t worth my time nor money to upgrade. Not on that basis alone at any rate. Power consumption is another matter.

    Which raises the energy question when pondering L3 caches in trade. Tom’s Hardware mentions the AMD 6Mb L3 taking a third of the die. We also see differences in TDP between say the Athlon II 260 and the Phenom 560 at 65 and 80 watts TDP respectively, the primary difference being inclusion of L3 in the latter case. Or looking at the Phenom II X4 840 at 95W compared to the Phenom II X4 955 at 125W and then across the range of clock frequencies only to ascertain L3 not scaling linearly as a represented by TDP. But then must realize memory speed being decoupled from core clock and the fact that TDP is a poor indicator of energy use being an engineering function of heat extractability from the core that just so happens to be denoted in watts (thermal), therein a loosely coupled relationship generally indicative of power consumptive maximums as found in difference to typical use cases.

    So we don’t know with any certainty at the moment what portion of the power budget is consumed by an L3 cache beyond a potential third (by die area) worst case although less in all probability. But therein value to be factored in the greater equation when balancing purchasing cost, operational cost and the derivation of variable real world performance benefits that can bounce between 20% and zero, or worse given possibilities in pipeline poisoning or a cache miss resulting in pause for the cause time detriments to flush and reload thus reducing cache efficiency gains in turn.

    This is not to denigrate L3 cache insomuch as mulling over reasons why it isn’t necessarily a must have in bang for the buck processor panacea while acknowledging the generalized performance improvements garnered by its inclusion. If L3 cache power consumption numbers are available somewhere out in the wilds — I haven’t seriously looked – it would be of considerable help in establishment of informed opinion and tilt of decision between processors with the same core count as ascertained in the margins of clock speed and price, but not for example between dual core versus quad in typical scenarios at the budget end, especially when near term future proofing is part of considerations. L3 cache may give a slightly higher performance ceiling but when the cores saturate we hit the wall and there is something to be said for even single threaded applications having whole core to themselves thus realizing advantage by competing less for CPU time.

    Another point needing clarification is idle power draw between architecturally similar dual cores and quads. Once again Tom’s Hardware reports a difference or a mere watt on Intel silicon but so far I’ve come up empty regarding any current AMD comparison. I would expect similar but we certainly cannot assume that. Such it is in parts and parcels that all factor into the mix, but if that holds true for AMD as well then any reasoning to opt for a higher performance dual core on the budget low end (dollar for dollar) would involve overriding desires in support of a particular single threaded application albeit whatever the money might buy, either raw clock speed and/or known application specific benefits of L3 cache, otherwise it’s pretty much a slam dunk for a quad.

Comments are closed.

%d bloggers like this:
WordPress Appliance - Powered by TurnKey Linux