According to AMD, it’s got a roadmap to deliver a 25x improvement in performance-per-watt over the next six years — and it’s going to deliver this platform by 2020. That’s the pitch that arrived in my inbox today — and I’ll admit, my first response was to blink, read it again, and fire back a polite “No you won’t” at my hapless PR contact. This kicked off a bit of email back-and-forth, and led to a conversation with Senior AMD Fellow, Sam Naffizger. Could AMD actually pull this off?
Why 25x is hard to believeI was dubious about AMD’s 25x claim because Dennard scaling stopped functioning back in 2005, while Moore’s law no longer provides the cost scaling it once offered. Even if we’re generous with timelines, 2020 is only two full nodes away — nobody has ever delivered a 25x improvement in that short span of time, not even during the golden age of CPU scaling. Furthermore, as we’ve previously discussed, supercomputer vendors are concerned about the exascale ramp precisely because CPU and even GPU power efficiency aren’t improving anywhere near as quickly as they need to be for exascale by 2020.
Naffizger actually agreed on both these points. The company actually put together a few slides that illustrate the problem. The orange line above shows where we would be today if manufacturing had allowed Dennard scaling to continue functioning; the turquoise line shows where we actually are based on 2000-2009 improvements. AMD, of course, wants to improve the situation and put us up on that light green line — skyrocketing even faster than any other platform.
Difficult to swallow? You bet. But AMD doesn’t want to rely on smaller process nodes — they’ve got other things in mind.
How AMD intends to pull this offFirst things first — AMD isn’t trying to reduce power consumption by 25x in a no-holds-barred scenario, and it’s not trying to cut idle power by 25x either. Its goal is to reduce the power consumption of “typical” workloads by a factor of 25 — and that’s somewhat more achievable. According to AMD, GPU acceleration and HSA are just the beginning — long term the company wants to explore using new specialized cores for particular workloads. The performance of HSA should also improve in further generations, as graphics cores become more powerful and the links between CPU and GPU accelerate. There’s also talk of adding new specialized cores for other tasks, and of continued improvements to the hardware video decode blocks to reduce power consumption in those workloads as well.
One point Naffziger made in our discussion is that the operating system still takes a very coarse-grained approach to OS power management. AMD has already built a 32-bit controller on its modern APUs to better manage power frequency according to the needs of the software running on-chip, and the company plans to further expand this effort. It’s all part of the race to idle — the more time a CPU spends not running, the more its performance-per-watt improves.
Of course, improving idle and active power consumption through lower leakage current is still important, as this research paper from Tirias shows.
As process nodes shrink, the slope of the yellow area gets steeper every generation. Even idle power comes up as the voltage gradient changes. AMD’s central argument is that a combination of smart scheduling, intelligent throttling, fine-grained power management, and specialized heterogeneous cores can deliver an efficiency improvement that’s far larger than what we’ll see from smaller process nodes over the same time frame. It was also interesting to note Intel’s announcement yesterday of its own move into heterogeneous chips, with an FPGA integrated into a Xeon CPU. We are seeing a fairly definite shift towards specialized hardware blocks for specialized workloads.
An excellent starting point (for an ironic reason)When Intel launched its first Core 2-based Xeons, it circulated some PDFs claiming truly enormous efficiency gains. Anyone familiar with the previous generation of Xeon hardware, which relied on Prescott-based dual cores, would scarcely have found this surprising — Intel’s previous chips had been so terribly inefficient that the next generation allowed it to claim enormous improvements right off the bat.
AMD’s current APUs are in a similar position, if not quite to the same degree. Kaveri has improved notably on this front, particularly its mobile form factor, but no one argues that AMD could achieve much more with a more efficient processor design. We already know that the company is working on new ARM and x86 cores, with the first project iterations due in 2016. AMD’s current position, in other words, gives it a heck of a boost off the starting block.
But that boost is only going to take so far — past there, it’ll need an enormous level of expertise to continue pushing towards that 25x goal. What AMD wants to do is give end-users dramatic battery improvements when using a system for everyday tasks and in the modes where it spends most of its time. If the company can continue pushing for its heterogeneous processors and roll out a new, dramatically more efficient CPU core, it’ll be well on the way to pulling this off.
Tagged In hardwarecomponentsamdcpusARMcpux86apuAPUsKaverihsamoore's lawheterogeneous computingDennard ScalingShare This Article .article {margin:0px !important;}.AR_1 {margin :0 0 20px 0 !important;}.AR_2 {margin:0 0 20px 0;} CommentPost a Comment Asdacap CapThe first thing that surprise me is 2020 is 6 years away. I just realize that…
MisterBlatAnd next year, Doc Brown and Marty are supposed to arrive in the flying DeLorean with Mr. Fusion. And the Cubs will win the World Series. I’ll put my money on Mr. Fusion.
MayooI was laugh when I see these kind of plans “in the next 6 years”. Just look where cell phones were 6 years ago. Nobody could have predicted that. So many variables will change!
But still, hurray to power efficiency!
johnChip design begins about 6 years before launch… kinda fits if you ask me. I think there is a dose of wishful thinking here on AMD side but probably they just started work on a radically new design to launch in 2020 and have laid down the basics and the achievable goals for it… same was true for bulldozer… turned out a flop… this might too… or not… we’ll see
Anuraag AithalBy the way, I had heard of a 20 core Opteron. And I don’t see any news of it. Was the X1150 and X1250 the last till now?
pTmdThe very first 20 core Opteron MP had been canned in early 2012. Eww, long Long time ago. So far there is no signal of new die designs for the Opteron MP line. The upcoming x86 Opteron is the Berlin APU, which is derived from Kaveri, and Toronto after Berlin in 2015.
pelov lovThere was a paper floating around referencing a 16-core single-die Opteron, but that was likely canned as well.
pelov lovA lot of AMD tales with very little AMD substance. Perhaps I’m being too skeptical, but I’m finding it incredibly difficult to believe any their marketing. The skepticism isn’t undeserved either. Kaveri certainly bumped up perf-per-watt, but that assumes one is dumb enough to ignore that CPU and GPU performance (admittedly mostly CPU) has barely gone anywhere in the last year-and-a-half since Trinity; and let’s not forget that we’re short ~25% performance.
Is this more PCMark8 crap from AMD? Future benchmarks for future workloads? Future processors that exist only in theory that may or may not ever see the light of day? I’m sure that with the addition of a fixed-function unit they can claim a 25x efficiency improvement today depending on the workload and how it’s measured.
People don’t want lofty claims that AMD has no hope in hell of living up to without copious amounts of trickery and PR sleight of hand. I think most people just want some sort of a long term road map and a vision that’s not going to be dramatically altered in a year’s time. Therefore, I’m finding it rather difficult to believe they’ll bring about any efficiency improvements at all given that I’m not even sure AMD knows what they’re doing in a year from now.
Joel HruskaYou’re not the only one. That’s why my first response to the PR guy was basically: “Nuh-uh.”
But going forward, I think you’re going to see Intel and AMD engaging in a lot more of this. We’ve already got the ground work — DVFS scaling, crude OS support for power management, and clock/frequency monitoring in silicon. Kaveri adjusts CPU and GPU frequencies somewhat dynamically and gives the GPU more headroom in 3D workloads. A flaw in that system is what causes problems in games like Metro Last Light, as I covered this spring, but the building blocks are there.
There are some key areas where I think AMD can definitely improve itself. Specifically:
1). Time to spin work off to the GPU.
2). Cache/memory latencies (longer latencies = more power burned).
3). More support for OpenCL / HSA, leading to better utilization of these capabilities, leading to better overall power consumption.
4). A better chip to START with. Intel leads AMD by nearly 2x in some performance-per-watt measurements.
5). The incremental improvements offered by Moore’s Law.
6). Better support in hardware for specific workloads. Compare 4K playback power consumption now to where 4K power consumption will be in 6 years. (If you go back and look at how much we improved 1080p power consumption from 2006 (in software) to 2014 (in hardware) it’s instructive on this point.
This is always going to be workload dependent, but I think there’s a case to be made for the *idea* of dramatic improvements still to come.
http://www.korioi.net/ KoriosI wonder if this bold projection of theirs will involve a hybrid ARM – x86 design; and whether they can successfully pair two entirely different ISAs. Can this be pulled off? By which OS?
Joel Hruskahttp://www.extremetech.com/computing/182790-amds-next-big-gamble-arm-and-x86-cores-working-side-by-side-on-the-same-chip
I have theorized it might but that’s just a theory.
cpuexecutionJoel…i think AMD can accomplish this…if they turn the cpu clocks down to 1.4ghz with a turbo boost to 2.0ghz and with two die shrinks and a new cpu architecture…i have faith in jim keller..
pelov lovDie shrinks are yielding far less than they used to. To further complicate matters, they’re also becoming significantly more expensive.
cpuexecutionvery true….
pelov lovI think you hit the nail on the head with some of those. HSA/OpenCL and GPGPU have been talked about for the last couple of years, and while AMD has made some strides here (Adobe’s software suite), this endeavor is going to take years. For example, CUDA has been largely ignored by developers recently (in consumer-facing software. Different story in HPC to be sure), yet there’s still a very large amount of code and number of applications that can utilize it. GPU offloading isn’t going to come tomorrow or in the next 2-3 years, but closer to the next 5. Talking up HSA and OpenCL performance and expecting a breakthrough is akin to waiting for AVX2 to change the world — don’t hold your breath. Developers don’t care much about the x86/desktop space anymore.
And, yea, they desperately need a new x86 architecture. Kaveri has a bad case of Bulldozer in the sense that AMD actually went backwards with CPU performance… again. At some point, one has to ask why they even bother making high-performance x86 chips at all, especially when their cat family is so close and is so much smaller and the competition is so far ahead. Either gain appreciable ground or scrap it early and save the money for something worthwhile. It’s not surprising that the CPU/APU division is bleeding money.
What’s most annoying about this press release is that they’re not hesitant to declare lofty goals yet we haven’t seen a single AMD roadmap that stretches ~2 years. Carrizo is coming, apparently, but no idea when nor what it’s going to bring outside of the same socket (maybe?) and a lower TDP. There is no release date. There are no details on specific features nor architecture (CPU is supposed to be excavator? what’s the GPU?). The AM1 socket is a black hole that looks to be dead already, and the AM3+/server line doesn’t even exist. But hey, at least they’re going to increase efficiency by 25x in an unknown workload, right?
Joel HruskaHere’s what we know to date:
Carrizo is 28nm. It’s the same socket. I do not know if motherboards will be compatible. Excavator is looking like a minor point update.
GPU core is GCN, no major feature changes.
Carrizo will attempt to put AMD in a better position for die size and cost + allow it to fit into a more diverse set of systems. I don’t expect any great chances in performance over Kaveri — I expect they’ll have to give back whatever IPC they gain in clock speed decreases at the top end.
Just as Kaveri really shines as a mobile improvement to Richland, I expect Carrizo will further try to improve on Kaveri — bringing minimum clocks up, lowering overall power consumption, etc.
pelov lovSo Carrizo is essentially the Beema/Mullins update to Kaveri? Any idea when they’ll actually ease the bandwidth limitations on their GPUs and make an APU that’s worthwhile at 1080p gaming? M.2 support? DDR4? I’d peer at the roadmap to see if they have anything else planned, but that doesn’t seem to exist.
Joel HruskaDiscussion is mixed on this point. Roadmaps leaked last year showed Carrizo as a DDR3 part. Later roadmaps have indicated it will have a full southbridge onboard and support DDR3 and DDR4 both, but will disable the integrated I/O if dropped on a desktop. PCIe is supposed to be onboard but limited to 8 lanes for graphics — the other eight are split for connectivity and cross communication.
Carrizo uses HDL, which emphasizes power and die space savings over frequency. I suspect this is AMD’s Broadwell (in the limited ways the two can be compared).
Regarding your question about a higher-end APU, I think it’s important to remember that the AMD of today is not taking chances with its vital-but-low-margin desktop business.
I would guess that Carrizo will hold the line on DDR3 frequency but will support DDR4-2700. That’ll give it a 1.26x bandwidth increase in 12 months compared to Kaveri.
Then, in 2016, it moves to DDR4-3200 — a further 1.19x boost in memory frequency.
All current tests show the on-die GPU as memory bandwidth at almost every frequency. It’s reasonable to think that a DDR4-3200 bus would deliver most of the clock improvements it offers on paper — and that means AMD can scale its GPU performance up 25-40% without touching the implementation.
I think there *will* be a new GPU core the next AMD does an SoC refresh, but I suspect they’ll skip 20nm on the APU and push straight for 16nm /14nm .
pelov lovSo taking into account the HDL and decreased TDPs, we can expect Carrizo to roughly compete with the current Kaveri?
“Regarding your question about a higher-end APU, I think it’s important to remember that the AMD of today is not taking chances with its vital-but-low-margin desktop business.”
This is the same division that’s trending downward and hindering their comeback. I don’t think they have a choice but to take chances, otherwise their last bastion of superiority, the GPU, might disappear once Broadwell hits.
And what roadmaps? Have they even released an official roadmap? From my perspective, AMD has been purposely avoiding releasing a longterm roadmap in order to alleviate pressure regarding issues with execution and constant delays. The Kaveri release sums it up perfectly: It was originally meant to be released in early 2013 but replaced by a stop-gap solution in Richland, and then AMD said Q4 2013 launch (turns out they meant paper launch) and it finally hit the market in 2014. Worse yet, the interesting models still aren’t available yet. Due to the fact that AMD never actually committed to a date nor specified just what “release” means, it was never technically late.
Joel Hruskahttp://www.extremetech.com/computing/180207-perils-of-a-paper-launch-amds-a8-7600-pushed-back-to-late-2014
I promise you, that headline made folks grumpy. :P
You have to remember something — despite the downturn, AMD made about 80% of its money in desktop and mobile last year. It currently projects breaking 50/50 in traditional vs. new markets by end of 2015 — which means these spaces are still vital in the long run.
What I’m about to tell you is my own speculation.
We know GF’s 28nm bulk silicon process doesn’t scale like 32nm SOI. The power curve kicks up sharply after ~3.5GHz.
We know Carrizo is a 28nm chip.
We’re fairly certain both the CPU and GPU will use HDL, which AMD has previously said bought them the benefits of a die shrink and a substantial power consumption drop *without* moving to a new architecture.
Theory: Carrizo will have an even sharper power curve, topping out at 3.5 – 3.6GHz instead of 4GHz. Let’s say AMD does a “stretch” desktop part at 3.8GHz, a 65W chip at 3.5GHz, and a 45W chip at 3.4GHz.
Its 35W chips move to 3GHz base (up from 2.7GHz) with a top Turbo of 3.7 (basically only kicks in on single-core). It’s 19W chips move to a base clock of 2.5GHz, up from 1.9 – 2.1GHz.
That’s not enough to give AMD a great position against Intel’s Core i5 / Core i7 divisions, but even assuming Intel’s Broadwell Core i3 comes in at 2GHz in the 15W category, that Core i3 doesn’t have Turbo.
So our 19W AMD A10 has Turbo to 3GHz, 2.5GHz base, going up against a dual-core Core i3 clocked at 2GHz. 2x the core count + 25% additional clockspeed = competitive AMD.
johnWhat about the Samsung – GloFo – IBM – Mentor Graphics 14nm process? I suspect this will gain traction quite fast relative to previous nodes because of standard libraries provided by mentor graphics and cross fabs standard process. I expect H2 2015 or even sooner a 14nm process to counter intels … And I also suspect (according to previous trends) that it will be much more dense then what intel will be churning out… So I kinda feel the clock is ticking for Intel and they kinda get left behind as the global market seems to turn its back towards it… guess we’ll see if it’s just political or Intel will be left alone for real…
Joel HruskaYou have to remember the lag time between when a foundry node is available and when silicon actually ships. H2 2015 is extremely optimistic for 14nm and let’s be frank — GloFo doesn’t have a good track record at this point.
AMD has talked about Carrizo on 28nm. That’s the end of the BD line. There’s no replacement for any of their Piledriver equipment on the roadmap through 2015. That suggests they won’t be making a move until 2016, which would put them on their new ARM and possibly x86 architectures.
johnWell you do remember that AMD has taped out 14nm in winter… By now they should have a fairly good silicon even if still not ready for ramp I suspect the ramp will start at the end of the year with delivery 4-6 months later but we’ll see. I kinda feel there is a waiting game going on see who yields first… I suspect that the process on either side of the fence is expensive beyond what they are willing to pay for it… So they wait for the other to make a move so they have more time to mature the process and find ways to drive down the cost
christianhI’m always amazed at how people act like there are 50 other X86 makers and AMD is at the bottom… No, Intel cheated everyone else out and only AMD is left…
I tell people I use Intel at work and AMD at home and my home machine does much better with load than Intel ever has…
And Intel is still buying market share and people still buy them..
Can you all say SUCKERS…?
Joel HruskaChristianh, while I agree with your categorization, as a CPU reviewer of some 13 years, I disagree with your statements on Intel / AMD positioning.
AMD’s A10-7850K would compete very well against the Core i3-4330 if both cost $140. With the A10-7850K priced at $180, the Intel chip is absolutely a better deal.
In point of fact, the FX-8350 (also ~$180) is a better deal than the A10-7850K — but if I need multithreaded performance I’d be taking a very close look at Intel’s quad-cores. Intel tends to win such matches, even without HT.
christianhOf course they purposely dropped the bottom out of the market with E6300… By the time they released Atom they were hemorrhaging money… That set AMD back by YEARS and BILLIONS…
I REFUSE to support them… The Free Market is Intel’s antithesis…
Not to mention the backroom deals…
MEH…
johnyeah but you would need an extra dGPU to compensate the bad GPU on intel low-medium chips… So… I bought a 160$ kaveri(was a promotion of some sort – I’m sure you can get similar deals now if you’re looking for!) when it came out together with BF4 licence worth what… 30-40$ at launch? so I paid 120-130$ for the chip… And I don’t need an extra dGPU (even though I planed for one – I have an extreme overclocking board and huge power source) I figured it plaid all games I was playing in HD perfectly smooth (some even in FullHD)… nothing more I can request from a chip now really… So I decided against buying a dGPU for the moment.
Lately I’ve been playing with overclocking the chip and it overclocks quite nicelly on the GPU part (900-1ghz achievable on stock cooler). I will try to overclock the GPU while at the same time downclock the CPU or disable cores (one per module) and increase frequency since the CPU is mostly doing not much on mantle in BF4 for example. I will buy thief to test mantle with that too maybe even sniper elite
Joel HruskaYou should know that the APU gets very little boost out of Mantle. It’s too bandwidth limited. Faster DDR3 helps much more, though I wouldn’t go over DDR3-2133 for cost reasons.
http://www.extremetech.com/gaming/177677-dual-graphics-dud-intel-clobbers-amds-apus-in-budget-gaming/2
johnI have DDR 2.4 Ghz and I see a 10-20% improvement from oc-ing the gpu. And I see about 10-20% improvement from mantle alone, with OC the results are nothing short of amazing considering we’re talking about a iGPU… I still don’t see a reason to buy the planed 290x I wanted..
Joel HruskaWell…you and I have very different ideas about what constitutes a good experience in gaming, then. :) But hey, that’s fine. I’m glad you’re happy with it.
FYI: You should test your game performance with FRAPS to confirm that DDR3-2400 is actually helping you. When I reviewed the A10-7850K I had some DDR3-2400 here, but it was programmed with very high latencies (it was memory AMD sent along for the review). I actually switched it out for DDR3-2133 — the lower latencies on the DDR3-2133 actually improved performance over the high-latency DDR3-2400.
YMMV of course — if your RAM is timed more tightly than mine, you may not see the same thing.
johnI would read it like this:
AMD will improve in SOME respects 25x the efficiency of it’s design. This means that in CERTAIN workloads they will be 25x more energy efficient which can be done quite easily even with today’s tech: Just create a dedicated separately power gated silicon block to do that. So while 99% of the time that block is powered down completelly when it needs to work it lights up does what it needs to do quickly goes back to sleep. Now compare that to a generic computing thingy like any cpu or gpu of today and you get that figure right now.
BUT.
What I think they want to do is exactly what we 2 discussed on a older post of yours: Decoupling the decoders from the FPU’s. And have some generic FPU’s that accept instructions from any decoder on the die (be it GCN, x86 or ARM)… This has the potential of providing the granularity necessary for these claims provided they can have dynamical frequency management on a FPU base. Also the FPU’s shouldn’t be too large probably 128bit would be enough or even 64 bit BUT they need to figure out how to combine multiple FPU’s into one single operation (not even sure it’s entirely possible -efficiently- mind you- You would basically need to move all FPU’s needed on that operation on the same clock to ensure coherence and even then it would be atrociously complex & error prone). They should be small so they can power them on & off depending on need. Also They might want to start shifting workloads back and forth on the die to ensure the die stays uniformly hot and doesn’t have hotspots.
I know that what I’m proposing is a chip so advanced and so complex that it’s mind boggling… but except material shift to something else (graphene?!?) or a huge leap in nodes could not account for this…. AMD has not the research capacity or the funds to do either of those so I suspect they will push for about a factor improvement by means of architecture and a 2x improvement by means of die shrinks… Otherwise it’s just complete bullocks.
ronchTalk is cheap. Let’s see you actually do it, AMD. No ifs, no buts, no excuses.
http://www.korioi.net/ KoriosProjecting outrageous claims in a 6-year scale means, in business language, “Buy our stock”.
Bilal Mahmoodturn those energy savings into higher clockspeeds!
Joel HruskaSadly it doesn’t work like that. Power consumption increases at Voltage^3 * clockspeed.
Cool headJoel Hruska, Has there been any details on a 28nm performance node? it has been noted before that the Kaveri chip was clock limited mostly by Global foundries higher performance nodes not producing the goods, thus losing the majority of the ~30% CPU improvement. Is there any chance at all that we will see a Kaveri or Carrizo chip where the problem with the node is eliminated?
Joel HruskaThis chip *is* built on the GF high performance node. AMD has publicly stated that the lower frequencies are the result of the characteristics of that node *intrinsically* not anything to do with GF’s poor execution.
There is no serious chance of fixing what ails Steamroller in silicon. The chip is only about 8% faster than Richland, clock-for-clock. Even if Steamroller could hit 5GHz compared to Richland’s 4.4GHz, that wouldn’t get you anything like a 30% performance improvement.
jburt56Maybe it’s time to bring out DARPA’s terahertz electronics and say goodbye to silicon.
James ByarsI’m not going to say they can’t do it or it can’t be done, although it sounds like an faraway goal. Can’t wait to find out!
FollowFollow @ExtremeTech!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');ExtremeTech Newsletter
Subscribe Today to get the latest ExtremeTech news delivered right to your inbox.
More Articles ET deals: Samsung 840 EVO 1TB SSD for $400 Jun 20US Supreme Court deals major blow against software patents and patent trolls Jun 20The first ‘unfeelability’ invisibility cloak will please campers and princesses everywhere Jun 20T-Mobile Un-carrier 5 and 6: Free iPhone loans, VoLTE, and unmetered music Jun 20Dalvik is dead: Next version of Android uses new ART runtime to boost speed, battery life Jun 20 About ExtremeTechAdvertisingContact ExtremeTechET ForumsTerms of UsePrivacy PolicyZiff DavisJobs AdChoice/**/var _bap_p_overrides=_bap_p_overrides||{};_bap_p_overrides[8]={new_window:true}; (function(s, p, d) { var h=d.location.protocol, i=p+"-"+s, e=d.getElementById(i), r=d.getElementById(p+"-root"), u=h==="https:"?"d1z2jf7jlzjs58.cloudfront.net" :"static."+p+".com"; if (e) return; e = d.createElement(s); e.id = i; e.async = true; e.src = h+"//"+u+"/p.js"; r.appendChild(e);})("script", "parsely", document);Use of this site is governed by our Terms of Use and Privacy Policy. Copyright 1996-2014 Ziff Davis, LLC.PCMag Digital Group All Rights Reserved. ExtremeTech is a registered trademark of Ziff Davis, LLC. Reproduction in whole or in part in any form or medium without express written permission of Ziff Davis, LLC. is prohibited.
This post was made using the Auto Blogging Software from WebMagnates.org This line will not appear when posts are made after activating the software to full version.
No comments:
Post a Comment