By Ann Steffora Mutschler
Semiconductor engineering teams always have focused on stepping up performance in new designs, but in the mobile, GPU and tablet markets they’re finding that maintaining the balance between higher performance and the same or lower power is increasingly onerous. The reason: Extreme gaming applications can create scenario files that cause dynamic power consumption to spike out of bounds.
“Most people today talk about performance per watt, said William Ruby, senior director of RTL power product engineering at Apache Design. “This is really the metric. Keeping performance the same is probably not an option because even mobile applications, if you look at today’s phones, they are quad-core going to 8-core and even to 64-core, so performance is not going to be the same. Power is not going to be the same either. Power most likely will go up while performance/watt better be rising overall.”
So how do engineering teams deal with that?
“Technically, there is really only one answer to this, which is architecture,” Ruby said. “You’ve got to start thinking about what functions are enabled and when—basically doing the things that are absolutely necessary to do. In some cases that means making performance tradeoffs.”
There appears to be a growing consensus for that approach. “Everybody is trying to reduce the power—the software engineer, the hardware engineer—but the power reduction can be done at a much higher level,” said Solaiman Rahim, senior technology director for power products at Atrenta.
To help engineering teams make decisions about how to reduce power at the architectural level, RCP (remote procedure call) designs can be utilized, Rahim said. RCP is a protocol that helps to execute software in the network in different machines. “By looking at the activity data for the chip and doing some power exploration of the chip, we could identify the reason for the peak power in the design. The RCP usually has a lot of processing elements so there is a lot of processing going on, which usually causes a lot of power consumption. There are a few signals in the design that control those processing elements, so by providing this information by using our exploration capability and analyzing the activity data, we could provide this information to the architect and the RTL designer. Then they can use this information to do some hardware changes, such as gating the processing elements when it is not used, and also provide this information for the software engineer.”
Through this data, the software engineer knows exactly what in the design causes those peaks of power, and by using a cache mechanism in the software, the signals could be exercised less, resulting in power reduction, Rahim said.
However, this methodology is not widespread today because to do power exploration, simulation vectors are needed. “When you are at the architectural level or software level you might not get to the stage where you have those vectors,” he noted. “But also it depends on the quality of those vectors and having the power exploration tools that analyzes the RTL and provide some guidance in the architecture—something that is not widespread.”
Intense challenges in extreme gaming GPUs
In the extreme gaming space, Vic Kulkarni, general manager and senior vice president of the RTL Business Unit at Apache Design, observed that what Japanese and U.S. semiconductor companies have been facing in the last few months when they are doing extreme gaming—things like shader cores—are now a very important part of a GPU.
“How do you manage the shadows and also sunlight, smoke and fire,” asked Kulkarni. “Those are the most difficult rendering in GPUs. In extreme gaming, a lot of things are changing real time and the user is plugging away, moving all kinds of joystick controls. How do you move the shadows and the sunlight and then have long shadows, short shadows, almost in real time? Tremendous GPU processing is happening during the extreme gaming when scenes and frames are changing so rapidly. What this means is the FSDB size and the scenario files are getting huge, and that’s where the technical challenge comes in. For that split microsecond or even nanosecond of simulation time, FSDBs have to be processed, and without jitter. Smooth rendering is the technical challenge the GPU companies are facing.:
One way to handle this would be the critical signal selection of large FSDBs followed by power profiling.
“Not only is it high-performance processing, but it definitely sucks the power,” said Pete Hardee, low-power design solution marketing director at Cadence. “ Obviously the gaming experience that gamers want now is every bit as good from the graphical rendering point of view on a tablet as it was just a very short time ago on a high-end, gaming-enabled desktop PC. What’s special about the way GPUs work in gaming applications, and in fact in any HD video, is that the power consumption is dependent on the number of frames being processed as well as the content of those frames. The rendering and shadows and smoke and fire—all of those things that bring the realism to the picture—is pretty extreme because it’s so fast moving and there’s stuff happening all over the screen.”
All of this adds up to each frame being extremely busy in terms of what changed from the last frame, which is why the GPUs are structured to have multiple rendering engines. Those engines switch in and out, depending on the processing need, while the overall frame rate remains constant. But the amount of processing going on changes depending on those content issues, Hardee explained. As such, it’s extremely difficult to estimate power with any degree of accuracy.
The only way to adequately represent the design in a scenario is through emulation, which is why GPU companies such as Nvidia have set up massive in-house emulation labs.
“You need accurate characterization, but you also need to process a lot of realistic activity actual picture, actual video frames, and emulation is the only way to do that,” Hardee noted.
Software has come a long way in making all of this work efficiently, too.
“Everything is always getting more complex, and so we always adapt to change. As people want more and more functionality that just means the designs get bigger and we know that the tools do more, run faster and have better capacity than they did even 10 years ago,” pointed out Mary Ann White, director of Galaxy Implementation Platform Marketing at Synopsys.
The chart below shows people have to adapt to save power.
“From 90 nm on it’s super leaky so they have to adapt to using new low-power techniques that they didn’t before,” said White. “There’s no more ‘one button to push’ that says can you give me the best performance and best power. It doesn’t work that way anymore. Multi-corner multi-mode (MCMM) might not have been something that was commonly in use 5 to 10 years ago, but now because everything has different modes— you’ve got DVFS where you are dynamically changing voltage and frequency to be able to save on power—and so you want performance when you can have it. The only way to get to that is really to use MCMM. Now we are having people use hundreds and hundreds of scenarios with MCMM, so again the tools had to adapt to being able to do this. Once MCMM first came around it was definitely a serial process. Now there’s no way.”
Further, Sudhakar Jilla, director of marketing for place and route products at Mentor Graphics, explained that with variations in the process and variations in the design modes, each of these combinations is called a mode-corner scenario.
“Typically what happens is for each of these scenarios is you have the different design metrics sensitive to different things. For example, on Scenario One you could have you could have ‘set up;’ on Scenario Two you could have ‘hold;’ and Scenario Three you could have ‘leakage,’ and so on. Most of the time these are conflicting, so you optimize for one corner and the other corner will break. There are different ways to solve this problem,” Jilla said.
EDA tools look at all these mode-corner scenarios and across all design metrics at the same time. Typically, the traditional tools solve one, then try to fix another one. That’s normally what breaks the flow and what leads to iterations where you get a power number but the performance is off, or you get the performance but the power number is off, Jilla noted.
Going forward, to continue to manage the performance/watt, the entire ecosystem is has to become involved so that everybody can understand what’s going on, Synopsys’ White asserted.
“As processes change there are so many different things to consider. At first there were only three Vts, and then there became five Vts at 28nm. With finFETs it’s looking like it probably will go back down to three again, but the channel length variance options will probably go up. So it’s all these different things that you’re looking at,” she said. “Even if you have a plug in the wall, the green initiatives are forcing people to have to watch for power. I was not very happy one day when I my DirecTV unit decided to turn itself off after an hour to save power. There are different things driving different factors. If you think about all the clouds, Amazon probably has a billion machines sitting somewhere. Its cloud may be virtual to users, but there are actual machines sitting somewhere and them all of them having to be reliably on.”
The big concern in data centers is having enough cooling power to chill the servers, and having servers that are efficient enough that they require less cooling.
“The actual power usage, is different, but saving power isn’t because there are so many green initiatives now that whether you’re playing graphics on your Playstation 3 at home or on a mobile device, they’re probably thinking about how to make sure that it’s as effective with both,” she noted. “The bottom line is that there’s definitely power savings that can be done on all fronts. The techniques used might be different but the fact that they have to do it means that power is everything.”