Posts Tagged ‘TSMC’

Next Page »

Five Important Changes That Will Affect Power

Thursday, November 3rd, 2011

By Ed Sperling
So far most of the energy savings in SoCs have been achieved using two main approaches—turning off most of the chip most of the time, and changing the materials used to insulate against current leakage.

Over the next few years, changes to designs will be more radical, encompass more pieces of a bigger system, and they will be orders of magnitude more effective. From a market standpoint, there is little choice. Computing increasingly is going mobile, and time between charges is a competitive edge. The caveat is that increased battery life has to come with a subsequent increase in functionality. Everything that could be done with a plug now will have to be done without one.

That means rethinking everything from the hardware design to the usage model to the software that runs on those platforms. And it means getting chips out the door at least as quickly, if not more quickly. Here are five trends and approaches that collectively, and sometimes individually, will have a big impact on energy efficiency, power consumption and leakage:

1. Rethinking the basics. Some of the biggest advances in efficiency will come from optimizing existing technology. There is more to turn off, more pieces to improve, and there are more ways of doing it better.

Consider something as basic as the clock, for example. The big focus has been maximizing frequency for nearly five decades. There are even concurrent clocks to make that happen. But having them always on and always running at the same frequency means they use a lot more energy than necessary.

“Design has always centered around the clock being the heartbeat of the system,” said Chi-Ping Su, senior vice president of R&D for Cadence’s Silicon Realization Group. “So people always assume the clock will be on. What we have found, working with ARM and the processor type of design, is that the clock consumes an extremely large percentage of the power. Timing and frequency are based on the clock. So you build a tree to be the ideal clock and you do everything based on that. When we started looking at it, we started asking why clocks need to be balanced at all.”

So how much energy can be saved? Su contends the amount is up to 30% of clock-tree power and up to 50% of dynamic power for the entire system.

He’s not alone in touting these kinds of numbers. Most SoC tools developers believe that dealing with energy/power/leakage at or before RTL can mean significant savings for the overall design.
“All the low-hanging fruit is still available to chip designers,” said Vic Kulkarni, senior vice president and general manager at Apache Design. “We find that even advanced designers are more concerned with meeting functionality and identifying power bugs. What they forget is the relationship between data, clock, reset and enable—the four signals in an SoC.”

2. Reducing distance and resistance. Over the next two years the SoC industry will undergo a radical shift that will continue for years to come. Rather than plotting Moore’s Law linearly, transistors will be placed in three dimensions.

Driven partly by re-use, partly by time-to-market pressures and partly by physical limitations, 2.5D and 3D stacking will have an enormous effect on energy consumption and power. By stacking memory and other components on top of logic, the distance a signal must travel can be shortened significantly, along with the energy necessary to drive that signal.

“Moore’s Law is not a law,” said Wally Rhines, chairman and CEO of Mentor Graphics. “But the easiest way to reduce the cost of a transistor for the last 40 years has been shrinking feature sizes and growing wafer sizes. We are coming into an era where it will be more cost effective to stack die than to shrink feature sizes. We will hit it with memory before logic, but as with all new technologies we will adopt it before it is cost effective because of unique capabilities.”

Whether it’s done with an interposer, package-on-package, or flip-chip bumped die, Rhines said there is a 70% decrease in power dissipation if the memory can be put on top of a processor.

And that’s just for starters. By adding more processors that are sized for a particular function and tying that to just the right amount of memory, rather than a whole memory chip or block, far less power is needed. Companies such as Tensilica and ARM have been making this case for some time. With stacked die, their arguments are likely to receive far more attention.

3. New materials and structures. Calling a material “new” is something of a misnomer in SoC design. Most of the techniques that we consider revolutionary have been around for decades, but they haven’t been developed enough to the point where they are cost effective, both from a yield and materials standpoint.

Through-silicon VIAs, for example, have been talked about since the late 1950s, and interposers in 2.5D packages are simply a collection of TSVs on a single die. But there are still issues to be worked out. Shang-Yi Chiang, senior vice president of R&D at TSMC, said there questions remain about how to integrate a substrate with an interposer, and how to debug it at different phases of development so it can be tested.

“There are a lot of parasitics to deal with in 2.5D,” Chiang said. “And with 3D we need time to make sure we can calibrate it.”

The other kind of 3D—structures such as FinFETs, tunnel FETs and nanowires—have been on the drawing board since the 1990s. All of these structures can lower leakage by controlling the gate at multiple points. FinFETs are planned in volume for 14nm by both GlobalFoundries and TSMC, while Intel may begin using them as early as 22nm.

These structures hold the promise of radically reducing leakage of both static and dynamic power using all modes of operation—at least initially.

“The problem is these are a one-off thing,” said Mike Muller, chief technology officer at ARM. “FinFETs do reduce leakage, but once you’ve done that you’ve still got three impossible things to do before breakfast. Those kinds of steps are part of the solution.”

Muller said combining those with stacking techniques will go even further. “It opens the door to completely different die-to-die memory interfaces which allow you to build more efficient systems than when you go off the chip, down the serial interface to a separately packaged die. It changes the memory bandwidth, and this is just a computer at the end of the day so memory is one of the fundamentals for performance. Stacking allows you to change that.

4. Lowering the voltage. One of the benefits of 3D structures such as FinFETs and stacking of die is that they make it easier to lower the voltage in certain parts of the chip. The reason is that the minimum voltage for DRAM may be higher just to maintain functionality than it is for logic or I/O. By separating those functions into different die, issues such as state retention and leakage can be confined and dealt with independently—the so-called divide-and-conquer approach.

So how low can the voltage go? Several years ago, researchers at IBM said the minimum voltage for an SoC would be at least 0.7 volts. It now appears it can be as low as 0.1 or 0.2 volts, and research is under way to lower it even further.

“You can get down to 0.3 or 0.2 volts without any problems,” Qi Wang, technical marketing group director at Cadence, said during a recent roundtable. “If you keep the aspect ratio of the depth and the height of a FinFET then you can guarantee the performance, but you do have other physical effects. Nothing is free. But the voltage can go much lower than what the textbooks say.”

5. Fixing software. Software is the last piece of the puzzle to fix, and it’s been one of the hardest for a number of reasons.

First of all, software takes longer to create and perfect than hardware. This is evident in all the bug fixes and updates. All three of the top EDA players are involved in this effort. Synopsys is working on software prototyping to get allow software to be written even before the hardware is ready. Mentor has been involved in simplifying the creation of RTOSes and embedded software. And Cadence has shifted its design approach so that software and hardware can be done far more concurrently.

But getting software out on time is only a first step. The next step is to make software function more efficiently, an approach that dates back to the RISC vs. CISC wars of the 1990s. Reduced instruction set computing was more efficient than complex instruction set computing, which boosted performance. By taking that approach one step further, it also can reduce the amount of energy consumed by a particular task, and be used to manage the overall power in an system much more efficiently.

Work on symmetric multiprocessing continues, as well. How far that will go is anyone’s guess, but for most applications we now seem to be facing a limit on the number of cores that can be effectively used by most applications. Talk about unlimited number of cores has given way to limited numbers of cores and unlimited numbers of processors spread throughout a system—most of which are off most of the time.

Taken together, all five of these trends will have a huge effect on efficiency, power and leakage. And now that battery life is a competitive issue, it also is likely to be used by vendors and seen as a value add instead of an unnecessary engineering cost—or worse, a nuisance.

Experts At The Table: Retrofitting Older Process Nodes

Thursday, September 8th, 2011

By Ed Sperling
Low-Power Engineering sat down with Walter Ng, vice president of the IP ecosystem at GlobalFoundries; Vishal Kapoor, vice president of marketing for SoC realization at Cadence; Naveed Sherwani, CEO of Open-Silicon; John Heinlein, vice president of marketing at ARM; and Jeff Lukanc, director of engineering at IDT. What follows are excerpts of that conversation, which was held in front of a live audience at the Global Technology Conference in Santa Clara, Calif.

LPE: What is the definition of a mainstream process node these days and why are older nodes so important?
Heinlein: We’re thinking of mainstream as 55nm and older. That’s where a lot of the high volume is. Even though it’s sexy to talk about the leading edge, last year about 75% of ARM’s royalties came from cores that were developed in 2006 and earlier. About 3 million of the 6 million cores we shipped were ARM 7.
Ng: From a manufacturing standpoint, the volumes are at 65nm. From that node it’s moving from 55nm and 40nm, but that’s still the bulk of the industry. A lot of companies are doing some very cool things that are very relevant today at those nodes. Even with some of the biggest companies, a lot of the volume is at 65nm. It’s what pays the bills. If you have 200mm capacity, those fabs are completely depreciated.

LPE: How about for the tools? Does the mainstream part of the market really pay the bills?
Kapoor: From an EDA perspective, 65nm pays the bills as much as 28nm and 20nm.

LPE: Is everything still following Moore’s Law? If a company is designing at 65nm, does it necessarily move to the next node?
Sherwani: We look at everything from networking to consumer applications. Some customers need the latest technology. But there are others who are at 0.18 (microns) and thinking about 0.13, and maybe they don’t to go there. The velocity of that move is segment-specific.
Lukanc: The mainstream for production is 0.13, but a lot of the new designs are ramping to 65nm. We’re looking at older technology and combining new things through integration. There may be a call management IC with a 30-volt option at 0.13 or 0.18, which allows the unique combination of analog and digital management on one chip. We can re-use some of the older technologies.

LPE: There’s a lot of investment in older processes these days. Why?
Sherwani: I visited about 10 fabs in China and I was surprised that none of them had 65nm processes. Most didn’t even have 90nm processes.
Ng: If you look at what’s driving a lot of technology today, it’s the consumer market. And that’s very cost-conscious. If you can’t take advantage of the latest technology, then you look at where your given application makes sense. Cost is very much a factor that customers consider at each process node. And for us, we have to find ways to keep investments in fabs relevant to our customers. We have a big focus on high voltage and power management. We have to find ways to add value on top of baseline logic, which is a commodity at this point.
Heinlein: If you look at smart phones, everyone is always focused on the processor and the high-end chip. But alongside those are the power management controllers and display drivers and RF/mixed signal. Another area for derivative value-added processes that Walter (Ng) mentioned is low leakage. When you get to 65nm leakage is a problem. There are ultra-low leakage variants and high-voltage variants coming out at the high end and the low end, so people can put those into applications that can run on a coin-cell battery for 10 years. To complement that there are ultra-dense libraries that bring the cost and the leakage down and which are suited very well to these kinds of applications.

LPE: If you develop a chip at 180nm and the process changes to low leakage or low power, does it yield the same?
Ng: The strategy in developing these new processes or modules on top of derivatives is to preserve the investment that was made earlier. It takes advantage of the proven solutions that are already there. When we originally developed those processes, at that time they were leading-edge processes. As you get much more volume using those processes, the manufacturing window becomes quite tight. You could probably tighten up the bit cells. But it’s a business tradeoff whether you re-invest in that or not. The yields are just as good.

LPE: What happens to the tools and the IP that was developed?
Heinlein: For the most part it all works. If you think about 180nm, nobody cared about leakage because it wasn’t an issue. Now, when people look at 180nm, they do care about leakage and power management. So we’re putting that back into 180nm.
Kapoor: The innovation at the leading nodes is going to drive benefits at the older nodes. You drive it back in terms of products, but you also drive it back in terms of design techniques. We developed a 28nm PHY, and we were challenged to do it differently because it’s for a leading node. Today we’re applying what we’ve learned back to 40nm and 65nm.
Lukanc: The best tools are developed at the leading nodes, but you may want to characterize older libraries for low power and power management.

LPE: If you improve an existing technology at an older node, can you charge more for it?
Lukanc: Yes. In general, what we’re offering is value-added solutions. In some cases we offer value-added solutions that are low power.

LPE: Will it be essential for older processes to be updated when we get into stacked die as a way of decreasing the overall power budgets and physical effects?
Sherwani: The answer is different for each area. There is no single, simple answer to that.
Kapoor: For a long time our industry has looked at the technology piece rather than the economics. The answer is, it depends. Can you get more value out of an older node? Yes. The economics will drive the longevity of nodes and what you can get out of them. But we cannot talk about the value of older nodes unless we invest in the newer ones.
Lukanc: If you have an existing product, you can look at the option of integrating oscillators or an EEPROM or something else on top of it to reduce the system cost. There are lot of things you can do in a package to reduce the overall cost, but you have to look at the total system cost. You may be offering a smaller footprint to the customer, but they may not be getting value out of that.
Heinlein: If you look at mixed signal and RF design at the leading-edge nodes, it’s really tough to get the transistor variation to be complementary to the analog. There’s a point at which it’s too hard, and in that case a heterogeneous 3D package makes sense.
Kapoor: With 3D ICs there’s a technical capability about whether you can marry different die. But you also have to look at it from a system capability. When you look at tablets, where the SoCs are talking in very high bandwidth to memory, that makes sense. The technology by itself won’t be an answer. You need to find out where it makes sense to use it.

LPE: Is investment in older process nodes an arms race that favors the big foundries?
Sherwani: The specialty foundries being built in places like China have nothing to do with companies like GlobalFoundries and TSMC. They will ship a lot of silicon. Over the next 10 years a lot of the analog silicon will be shipping out of China using all older nodes.
Kapoor: Those boutique fabs are certainly making investments in areas in which they specialize.
Ng: You have to continue to make fabs relevant and to drive a good margin. A big impetus for us in developing modules on top of our processes is that you do get the second- and third-tier foundries coming in and taking the floor out of the base logic price. That’s difficult for us to compete with. So we’re looking at where to add value and how to win a good percentage of market share. We have our investments in 200mm. We will continue to invest there.
Heinlein: We definitely see lots of specialty processes at the smaller players. We work with them and enable them. But once it gets to a certain point in the market they we work with the big players.

LPE: Will it become a battle of who has the deepest pockets?
Sherwani: The good thing about older nodes is that the investment needed is miniscule compared with the tens of billions of dollars at advanced nodes. A lot more players can be relevant at older nodes. At 14nm I don’t think there will be more than three or four players.
Ng: The incremental investment to bring up these value-added modules is nothing compared to the investment at the leading edge. The other side is that the equipment manufacturers are a leading component of the cost at the leading edge. At the mature nodes, you’re not buying a lot of new, expensive tooling.
Lukanc: That happens on the product development side, as well. To do a 100 million-gate design requires a certain amount of tools and people and mask costs. At the older technologies mask costs are quite cheap. And if you’re re-using technology and adding to it, you can keep NRE low so return on investment is quite high. You need to take advantage of mainstream older nodes as well as more aggressive nodes.
Ng: And most times our relationship with most of the leading-edge companies span multiple nodes.
Kapoor: At 14nm there are 5 or 10 customers. As a foundry, you have to worry about how you’re going to get the rest of the industry in. The economics even for the companies that can afford it aren’t that great. So you’re going to see continued innovation even at the older nodes.
Ng: A major part of the foundries’ concern is up and down the supply chain. It’s not just the fabs. It’s the tools, the support for IP providers, and packaging solutions. That’s a challenge we have to address as an industry.

Power Bits: July 15

Friday, July 15th, 2011

By Ed Sperling

Portability Play
Synopsys is working with GlobalFoundries to deliver interoperable process design kits later this year at advanced nodes. iPDKs are particularly important for companies looking to use designs for multiple markets. A general-purpose process, for example, is critical for markets looking for higher performance, while low-power processes are important in applications where battery life is a differentiating factor.

The problem is that many of these designs are not always portable between processes, despite the fact that power and performance are considered tradeoffs in most designs.

The companies said the 65nm G and enhanced low power (LPe) kits are available now. Versions for other process nodes will be available later this year.

Stacked die demo
Imec, the Belgian research organization, demonstrated a stacked die with DRAM on logic at Semicon this week. The chip is a prototype of what is expected to become a mainstream approach as companies seek to re-use existing analog IP and subsystems from previous nodes, as well as to add flexibility and speed to complex designs.

What’s particularly interesting about the prototype is Imec’s description of how heat can be removed from the die. Logic generates a fair amount of heat, but the DRAM die acts as a conductor for some of that heat. Qualcomm observed similar effects in its own stacking research last year.

Imec’s work was done in conjunction with GlobalFoundries, Intel, Micron, Samsung, TSMC, Fujitsu, Sony, Amkor and Qualcomm.

5 Ways To Cut Power

Thursday, June 16th, 2011

By Ed Sperling
Low energy consumption with minimal leakage has emerged as the most competitive element in an IC design, regardless of whether it involves a plug, a battery, or whether it’s powered by a gasoline engine.

While components on an SoC aren’t always power-aware, they’ll have to be in the future as consumers focus first on energy efficiency. With rising fuel costs, a concern over global warming and a steady reminder that smart phones have to be plugged in every night, car companies are shifting their strategy from efficient hybrids to even more efficient plug-in hybrids and electric vehicles, and California has gone so far as to mandate that one-third of all electricity sold in the state by the end of 2020 must come from renewable sources.

This shift in public awareness hasn’t been lost on the chip industry, which has been rolling out some very complex advances well ahead of schedule. Here are some of the most important:

Clouds
The push toward a cloud-based infrastructure is a way of centralizing computing—basically a return to the time-sharing model once perfected by the mainframe and then re-distributed with the advent of the commodity PC server. The data processing world is re-aggregating, but this time with a difference. It’s not just that the computing is being centralized. It’s that the centralization is taking place in proximity of cheap power sources such as hydroelectric power, nuclear plants (for now) and wind farms.

“Cloud leads to big efficiency gains,” said Chris Rowen, chief technology officer at Tensilica. “Now you can put the computing farm where the energy is available. It’s an arbitrage opportunity. It’s not hard to ship bits when you compare that to the difficulty in transporting electricity.”

There’s a clear business case to be made on this front. An estimated 6.5% of electricity is lost in transmission, according to the U.S. Energy Information Administration. That may not seem like a lot until you consider those are high-voltage transmission lines. Bits are cheap, in comparison—even trillions of them—which is why there is talk now of centralizing portions of even base stations. Those parts that do intensive computation with a high degree of redundancy are prime candidates for being located in a data center.

“There’s a lot of computation needed to reduce noise and create a clean signal,” said Rowen. “But there’s also some computing that has to be done locally because there are tough latency requirements.”

Adaptive Body Biasing
Adaptive body biasing has been under serious discussion for the past five years as a way of reducing current leakage by controlling a device’s body voltage, which in turn increases the voltage threshold. The big advantage here is less switching to the off state. The downside is this is has been difficult stuff to design and manufacture.

“This was not seen as a mainstream approach, but now it’s showing up almost everywhere,” said Aveek Sarkar, vice president of product engineering and support at Apache Design Solutions. “This was seen as a challenging technique to implement, but now TI and Samsung are using it. If you change the body bias voltage, you impact the threshold voltage. You can increase or decrease leakage, as needed, and boost performance.”

Consultant Bhanu Kapoor, president of Mimasic, noted that for some high-performance applications the alternatives such as power gating may be impractical because it simply takes too long to turn on and off sections of a chip. In those cases, body biasing is the only choice.

Atomic-Level Changes
Another technique that has been particularly difficult to master is atomic-level control of channel doping on the manufacturing side. And while most experts don’t expect the process and manufacturing side to offer any huge gains, this one may be the exception.

Scott Thompson, chief technology officer at startup SuVolta, said that by improving the doping technique, both dynamic and static current leakage can be reduced with regular bulk CMOS.

“The problem is that the wall around the channel is leaky and it’s hard to control the shape,” said Thompson. “Strain engineering helps to control the atomic-level analysis. But there has been no other breakthrough other than changing the transistor, and we don’t see a need for that for all architectures.”

At its unveiling last week, SuVolta had lined up support from Fujitsu, Cypress, ARM and Broadcom. The company claims the technology is an alternative to FinFETs, which are more difficult to manufacture.

3D Transistors And Packaging
Nevertheless, the major foundries have committed to building FinFETs at advanced nodes. Intel’s announcement of a Tri-Gate three-dimensional transistor at 22nm has been a major topic in the semiconductor industry. The question is now that Intel has publicly committed to the technology, can it really be manufactured with sufficient yield? And can it be built effectively using the disaggregated foundry model in the near future?

These kinds of questions will remain unanswered at least for the next couple years. TSMC is planning to use FinFETs at 14nm, and GlobalFoundries has been working on the same technology. Nevertheless, the big advantage of FinFET technology is a sharp reduction in leakage while providing a significant performance boost.’

Creating stacks of die also has a huge effect on power, in part because the distances between logic and memory can be shortened significantly. A system-in-package version of stacked die, using interposer technology, is expected to begin widespread production over the next 12 to 18 months, bolstered by the new Wide I/O standard that increases the size of the pipes between logic and memory.

New Materials
Fully depleted SOI, silicon on sapphire, as well as new ways of putting them all together in stacks connected by low-cost interposers that can be made of glass have turned into major research efforts as companies seek to knock costs out of the bill of materials for new chips.

While the FD SOI has been well tested for years by the Common Platform participants, the others have only been used on a very limited basis. One approach now being considered is actually designing chips to run hotter rather than trying to keep the power down. While there are limits to this approach—no one wants to pick up a hot phone—there are times when performance is more important than heat.

Taken as a whole, all of these changes can have a significant reduction in power, particularly when coupled with efficient software code and more customized user controls—and end devices that actually use the power-saving technology that is being built into these chips.

High Performance And Low Power

Thursday, April 14th, 2011

By Pallab Chatterjee
As mobile platforms become a larger part of the component spectrum, their need for optimization beyond low power has moved to the forefront.

Traditionally, standard “line-cord” based products in both the consumer and commercial sectors have used the “G” label processes from semiconductor foundries. These processes had the highest-yielding combination of design rules, device performance and leakage as a tradeoff triad. The “G” processes were then further split into the “HP” and “LP” flows. The “HP” processes are high-performance optimized with the most aggressive design rules, lowest Vt, and support standard to higher operating voltages. The “LP” processes are optimized for low power and feature design rules targeted for the lowest leakage, support lower operative voltages, and tend to have the slowest transistors of the three options.

These process labels have been the industry norm from the 250nm era through the 40nm processes. At 28nm and below, a new process is emerging called the “HPL” or “HPM” process. UMC offers an HPL flow, which is a high performance and low power dual-corner optimized technology. At TSMC, the newly offered HPM flow is for high-performance mobile applications and is also optimized for high performance and low power.

The complexity of SoCs for mobile applications has driven them to use cutting-edge processes. The rise of computing visualization and content playback has forced these extended battery operation cycle products to embrace multicore architectures with embedded memory as the main design. To accommodate these activities, along with high-performance graphics handling, the designs have moved to single die SoCs, which minimize I/O as a method to reduce power.

These multicore designs also feature advanced power management based on switched power controls and a controlled state-based turn-on/turn-off of the power grid to different power blocks. Power-switch devices, with the ability to have very large devices to minimize the “on” resistance, are typically not optimized for high-performance processes. The new flows allow these devices to be built, along with high-performance processor and graphics cores, with significantly lower leakage than on HP flows.

TSMC announced the new flow earlier this month as a specialized optimization for battery operation, low-operating voltage, low leakage, and high-speed logic and memory access at the 28nm and 22nm nodes. The mobile platforms are driving enough of the wafer volumes to warrant a specialized flow rather than a “mix and match” from the other processes. The driver is not only smart phones, but also netbooks, tablets and other platforms that will consume graphic content. This content is spit between gaming applications and video/TV material. The video/TV material has the additional power optimization point of RF for the streaming connection to receive the content. The gaming content tends to reside locally on the platform.

This new process optimization also is driving new IP. The I/Os typically are migrating over from standard LP processes, as there is no major change to the external world. However, high performance IP is not applicable to the new flow. The basis of the new IP is power control and operation in a power envelope. From this constraint, the performance optimization is then imposed.

Companies such as Imagination Technologies, which feature soft IP, will not have any major issues with optimization to the new process offering. However, hard processor cores, cache memories, DSP’s, graphical user interfaces and display controllers will have to be redesigned. These blocks will need to incorporate the power-switching logic into their design, and support native multi-voltage blocks.

With UMC and TSMC offering these processes for foundry, and Intel and Samsung having them as internally use new processes, it won’t be long before GlobalFoundries and the Common Platform bring this new optimization point to market.

Gene’s Law Meets EDA

Thursday, March 17th, 2011

By Pallab Chatterjee
What will be the next major improvement that will cut power levels by an order of magnitude?

That question was the basis of a roundtable discussion at the recent ISSC conference. Current technology provides incremental improvements each year, but the next generation of electronic systems will require dramatic changes and innovation. This premise is based on Gene’s Law—that’s Gene Frantz of Texas Instruments—which states that the power efficiency for DSPs doubles every 18 months.

The roundtable consisted of six panelists from TSMC, Hitachi, STMicroelectronics, UCLA, Infineon, and an industry veteran consultant. The interactive challenges were posed by domain experts from Imec, the University of Tokyo and Stanford University. In addition, it was moderated by Jan Rabaey of the University of California at Berkeley.

Rabaey outlined some of the major challenges as the opening for the discussion. Among them:

  1. The impact of technology scaling being reduced for new processes;
  2. The impact of voltage scaling is reduced as a proportion of new power supply levels;
  3. Getting control of the wasted energy in the systems, and
  4. Identifying energy efficient design architectures.

At the core of the discussion were two main themes. First, new devices will be at the center. And second, there will need to be new new tools and methods to implement these next-generation designs. The device migration is toward 3D devices and low leakage devices and substrates such as Fully Depleted SOI (FDSOI) and FinFETs. These structures have the ability to provide consistency in performance despite lithographic challenges. Lithography is just one of the many aspects of variability that area plaguing sub-20nm process technology.

The manufacturing challenges for these new devices have yet to be determined, but the solutions to these challenges will help shape the product designs and architectures. Moreover, yield, predictability and power performance will drive the operating power supplies and determine how much power is wasted in the design through heat and inefficiency in current transfer.

As these devices operate in a different voltage and current modes with different sensitivities from standard planar (2D) devices, new models and device-level simulation tools are needed to capture their characteristics. Due to the small geometries and the resulting large density vs. device units scaling, new matrix-solving routines need to be created to solve the equations without causing the tools to slow down or fail to converge to a solution. This is already driving changes in EDA and CAD to support the devices. The capacity, capabilities and throughput of the current EDA tools is not sufficient to be able to address the new 3D device requirements and their associated block and system designs.

One of the major challenges is to address the guard-banding and safety factors in the designs that are wasting power. Current worst-case design optimization wastes operating power by helping to identify corners of the design space that may not be reachable in practicality, but require power to stay away from. The capacity and extent of verification software that can address multiple levels of the design will have to be created. One of the major issues is creating architectural tools that can help do power optimization as a driving design function rather than just as an analysis tool.

Key issues involving these tools is who will solve the fundamental problems—industry or academia—and whether it will become a viable business. For academia to put resources on solving the guard-banding and multi-level design issues it will need funding from the government (NSF) or industry to pay for the student and facilities needed to complete the work. If it is done by industry, companies need to know that when answers are found that customers will buy the products and that they will have some time advantage over competitors. With the scope and breadth of the new tools that are required, and the skyrocketing costs of building chips at these advanced node, the issue of, ‘Will there be enough people who need the results of the effort for the tool development to be a business,’was left open as one of the keys for addressing the next power plateau.

Widening The Channels

Thursday, March 17th, 2011

By Ed Sperling
Wide I/O—both as a specific memory standard and as a generic approach for on-chip networking—has been looked at for the past couple of chip generations as a way of improving SoC performance. Increasingly, it also is being used as a key strategy for reducing energy consumption.

Wide I/O refers to a number of different approaches in on-chip networking, ranging from through-silicon vias in 3D stacks to interposers in 2.5D stacking. It also refers to a standard for memory communication being developed by JEDEC, as well as more dedicated channels for signals. In all cases, the added benefit is a reduction in power needed to drive a signal.

The tradeoff typically is between serial I/O and wide I/O. Serial I/O is simpler to design and works over longer distances, but it is far less power efficient. Wide I/O, in contrast, is higher bandwidth with big power savings—Samsung, for example, estimates its new 1Gbit mobile DRAM based on a 50nm process consumes 87% less power—but the technology is also more complicated to use. And in most cases, it’s also more costly.

Eliminating complexity while adding more
The concept of bigger pipes has always been a last resort for chip architects. It’s well known that shortening the distance a signal travels and reducing the resistance can drive down the amount of power needed for a signal. Reducing the overhead of serialization and deserialization can cut the power even further. But ironically, it has taken an explosion in SoC complexity for chip architects to seriously consider simplifying signal paths.

“We always go through this pendulum swing of what’s the optimal physical implementation vs. what’s the simplest way to do it even if it costs more silicon,” said Steve Roddy, vice president of marketing and business development at Tensilica. “So you can do things with 128 wires using serialized I/O, or you can do it with a lot fewer using wide I/O. The serialized I/O requires deserialization, which costs power. With wide I/O, which could simply be a lot of wires connected to the next block, you can lower the frequency and widen the channel.”

In a 2.5D stack, that extra silicon is easier to justify because it doesn’t add significantly to the overall footprint. In a system-in-package or package-on-package it may involve an interposer, which is another piece of silicon. It also can involve a through-silicon via in a 3D stack, which is wide enough to avoid any congestion.

“With a TSV you don’t need a standard I/O, which includes the I/O circuitry, patch and bond wire,” said Tom Quan, deputy director of design methodology and service marketing at TSMC. “So you get rid of all the I/O circuitry, and you have the same area, power and current. That results in a tremendous power savings. You also get a big boost in timing. And if you use an interposer, that’s silicon so it has the same resistance and capacitance of a standard IC. You can simulate them both together and get a predictable result.”

Eliminating bottlenecks
There are many good reasons for using wider pipes. One is that multicore and multiprocessor implementations generally are inefficient. The whole idea behind these implementations was that software would be able to run across multiple cores and multiple processors. That didn’t work out as planned, due to the inability to parallelize many applications, but cores were still designed to share the same memory.

That’s inefficient from a performance and a power perspective. Cores that are not in use should be turned off or powered way down. Moreover, when they need to connect to memory it should be along a clear path with as little congestion as possible and over the shortest distance possible.

“For some years to come we’re going to be seeing systems in package with interposers as the ideal solution,” said Joe Sawicki, vice president and general manager of Mentor Graphics’ Design-To-Silicon Division. “That will involve a lot faster interconnects, mostly to memory, and potentially to homogeneous logic. One of our customers was developing a digital chip and needed Bluetooth. They did it in a digital IC and they also did it in a SiP. The SiP destroyed the SoC in performance and power.”

But the question also is at what cost. While 2.5D approaches are relatively straightforward, the interposer does add some cost and the TSV can add even more.

“We are pursuing full 3D and so are most of the people in the phone business, primarily because of the form factor and cost,” said Riko Radojcic, director of engineering at Qualcomm. If you think about an interposer, you’re adding another die to the cost. Conceptually an interposer is an elegant solution and it works fine for someone who sells a product for $100. If you throw in a $1 interposer it’s no big deal. But if you’re making a $5 die and you throw in an interposer, it is a big deal.”

The same is true of through-silicon vias, although the ultimate advantages of this approach are expected to become more significant over time.

“TSV is expensive but is a good way of meeting the form factor,” said Navraj Nandra, senior director of marketing for Synopsys’ DesignWare Analog and MSIP Solutions Group. “You need to optimize for both low power and low cost packages. It’s like buying a $50k hybrid car that gives you 32mpg compared to a $22k 1.2L, 3-cylinder petrol engine car that gives you 50mpg. Everyone is excited about the hybrid car.”

Optimizing the signals
Behind the hubbub about the I/O technology is another often overlooked piece of the equation. The move to multiple processors and multiple cores was done largely as a knee-jerk response to the end of classical scaling at 90nm. What has happened since then is a much more measured response to how to use these cores more effectively, which requires much more granularity in the design process. Not all cores need to be on an ARM or MIPS processor, for example, and not all of them need to be in one place on an SoC—or even on the same die of a SiP or 3D stack.

In addition, not all of those cores or processors need to be the same size or run the same software.

“In addition to wide I/O there are dedicated point-to-point connections to relieve the system congestion,” said Tensilica’s Roddy. “Those can include general purpose memory and processor. When the system architect knows beforehand what’s going to be in the system they can add those connections up front. So you may have a video decoder and buffer and an audio decoder using separate memories, and those may change depending on whether they end up in a cell phone or a set-top box. But there are some things you don’t know at design time and you need the ability to generate system-specific interconnects, which is what’s being sold by companies like Arteris and Sonics.”

And finally, there is a simple mathematic principle behind the push to reduce power.

“The longer a signal has to travel, the more power it takes,” said Qi Wang, technical marketing group marketing director for Cadence Solutions Marketing. “A lot of issues in design come down to power. If you put the memory outside the chip, that takes power. If you want to speed up performance, that takes power.”

Bigger pipes over shorter distances can help solve that problem, and it’s a solution that is beginning to garner much more attention these days.

The Deepening Design Gap

Friday, March 11th, 2011

By Ed Sperling
It’s no secret that designing SoCs is getting tougher, but what’s surprising is just how far behind the existing EDA approaches are lagging.

The result is a growing gap between what’s needed and what’s available to do the job. In a presentation at the Tech Design Forum in Santa Clara, Calif., yesterday, Shabtay Matalon, Mentor’s ESL market development manager, said there is a 55% growth in the number of transistors per year compared with a 21% annual growth in productivity.

Much of this gap is in the consumer electronics space, where the demand for better performance and more functionality is coupled with longer battery life. Among design teams surveyed by Mentor, 61.8% are currently developing single-processor SoCs, while 20.8% are developing multiprocessing SoCs, and 5.2% are developing chips with multiple cores and multiprocessing.

Fast forward two years from now and the expectation is that only 30.1% of designs will use one processor; 20.6% will employ multiprocessing; 21.4% will use multicore technology and 19.4% will use multicore and multiprocessing technology.

“There is a gap between the power requirement and the power trend,” said Matalon. He said that if the power is allowed to trend upward at current rates it will far exceed the amount that’s permissible in designs.

“At the same time, you have to deal with software and verification and account for the power requirement,” he said. “There are two verification challenges that need to be solved. One involves design goal challenges where you meet functionality with speed, power and cost. Power is one of the biggest risks today because it’s evaluated at the end of the design. The second challenge is multicore and multiprocessing designs. If you wait until you get to the back end of the process it’s too late.”

All of the big three EDA vendors have been issuing similar warnings over the past year, saying that after 45nm it becomes increasingly difficult to build complex SoCs without electronic system level tools. While the exact node has been somewhat in flux—the numbers vary between 65nm and 45nm, sometimes even within the same chipmaker or EDA company—the message is essentially the same. And all agree that at 28nm and beyond understanding transaction-level modeling and automating some of the analog design and verification is no longer an option.

TSMC and GlobalFoundries have been working with all the major EDA companies on incorporating ESL models into their flows. Tom Quan, deputy director of design methodology and service marketing at TSMC, said that Reference Flow 12 is being developed that will incorporate 28nm, 22nm and 3D stacking. The new flow heavily leverages ESL tools, which are a critical part of design for manufacturing.

Healthy Living Electronics Dominated By Power

Thursday, February 10th, 2011

By Pallab Chatterjee
The theme for this years ISSCC (International Solid State Circuits Conference) is “Electronics for Healthy Living.” In addition to the new microprocessors, memory and data converter technologies, the focus and keynotes are directed toward health-care products.

The common theme between all the talks is that health-care is being driven by mobility, information flow, and power. The key to high quality data transfer is having enough power to complete it—either wired or wireless. The key to mobility is to have autonomous power for the devices for the duration of time that it does not impact the activity the user is involved with.

The keynotes cover the range of silicon’s impact on the health care. Medtronics is discussing the scope of implantable devices, the reliability, data transfer and the system architecture of the implanted and external portions of the system. IMEC then follows with a discussion of the invasion of specialized purpose sensors that are now possible, their inroads into health care and the creation and powering of body area networks. Samsung then speaks on a different twist for health care. Their angle is that the major cause of pollution is energy consumption and hence generation. The way to address this problem is through reducing energy use in the manufacturing process and in the design of devices that utilize less power and can take advantage of innovative packaging.

Following the keynotes is the inaugural Plenary RoundTable discussion on how to address the next 10X reduction in power. The discussion is is hosted by Jan Rabaey of UC Berkeley and features TSMC, Hitachi, STMicro, Infineon, IMEC and other senior experts from the semiconductor and university commnity. This challenge, encompassing process innovation, CAD, design flows for digital, RF, analog, and memory is one of the key drivers for the next generation of energy efficient electronics.

Energy efficiency has now earned its own session with Energy Efficient Digital, which will be detailing such projects as ultra-low-voltage standard cells that operate down to 62mv of supply. Other new technologies include a 28nm DSP from TI that can operate at 0.6V, and wireless sensor processor that utilizes only 10pJ per clock cycle.

The technology development sessions once again mix between high performance and low power. On the high-performance side, architectural design for Terahertz (300GHz to 3THz) imagers and associated device blocks—amplifiers/antennas are being shown. On the low-power side, a transceiver that can operate at 0.24nJ/b, and energy scavenging converters that are now up to 72% efficient and generating 95mv, will be presented.

Filling out the program are tutorials on ultra-low power digital design and a forum on ultra-low voltage VLSI for energy efficient ICs. These sessions are expecting large attendance as they are the dominant directions for the next decade.

The shift for the conference and the industry is dramatic. Historically over the past 40 years the conference has been the vehicle where the biggest and fastest semiconductors were debuted. These devices now have to share the spotlight with the smallest, highest-density and lowest-power devices. The show is focusing a lot more on architecture, device technology and the systems aspects rather than just circuit blocks. This focus accompanies the idea that SoCs are true systems, and the they need to be addressed as such with focus on function, performance, power and application. The body area network discussions and technology, which balance data transfer and power as the main tradeoffs, are representative of the future of the systems and IC discussions in the future.

Power Model Complexity Grows

Thursday, January 13th, 2011

By Ed Sperling
The number of factors required for an effective power model has far surpassed the capabilities of even the most detailed spreadsheet at 45nm and beyond. It has now entered the realm of complex databases and architectural tradeoffs, and those tradeoffs will become even more complex as 3D stacking takes root over the next 24 months.

The idea of modeling power is hardly new, but you wouldn’t know that comparing the current iterations side-by-side with the old methods. While there is still a need to understand worst-case scenarios to protect signal integrity, not to mention the other components on a chip, there is far more that needs to be considered in power modeling at advanced nodes than in the past.

“There are two issues that need to be solved,” said Ran Avinun, marketing group director for system design and verification at Cadence. “One is how to do this. The second is who owns the format. The methodology hasn’t been solved yet. When you tell the customer that we’ll compare our numbers with your back-end flow and libraries, that’s not good enough. It’s going to give me the data about the SoC or the ASIC, but that’s not enough. When customers look at power it’s what they measure in the lab. When they test the device it’s in a real environment with real software and the package. Today they don’t have a good way to test. It’s done with software and vectors, but it’s not really reflecting what the user will get.”

The second issue is understanding what parts of the system actually consume power. “Customers don’t know how to partition the power consumption of the ASIC vs. the overall system, so what they measure is the overall power. They don’t know how to partition those components and there is no good way to model that. We’re looking at the ASIC and die level, but they need to model the whole system,” Avinun said.

Moreover, for the system-level numbers to be used in a meaningful way at the architectural level they have to be relatively accurate. The shrinkage of components has made everything more susceptible to the effects of power, mechanical and thermal stress, electromagnetic interference, electromigration and noise (see fig. 1). Modeling power is now required. But even the simplest ideas such as power supplies are no longer simple.

Fig. 1. Source: Apache Design Solutions

“In the past we had two power supplies, one for the digital and one for the analog,” said Cornelia Golovanov, an EMI expert at LSI. “Now we have three or four analog power supplies in a small area, which makes the supplies very inductive. These are not well analyzed in the context of the whole system.”

Like anything else at advanced nodes, without adequate planning power supplies can be corrupted. Even maximum power, which used to be calculated in a worst-case scenario fashion once RTL was already synthesized, has become incredibly complex with multiple power islands, multiple modes, multiple cores and multiple voltages.

“With a power model ideally you want to cover all scenarios and all vectors,” Golovanov said. “But some of these have really long simulation points. It can take weeks at the end of the design cycle, and then you have to factor in the chip in the package on the PCB. There is no time for that.”

What’s in the model?
That’s where models fit in. Much attention has been paid to the different library approaches for defining power intent with the Unified Power Format (UPF) and the Common Power Format (CPF). The power model is a level above that, defining the power delivery network, signal integrity analysis, electromagnetic interference and compatibility (EMI/EMC) and the thermal effects of power, primarily in the form of dynamic and static leakage.

“In the past power models were simplistic in nature and deemed sufficient for the needs at that time,” said Aveek Sarkar, vice president of product engineering and support at Apache Design Solutions. “You could provide a single current and single capacitance and the result was your best guesswork. That all began to change in 2006 when we moved to 65nm. The package design could no longer be off the shelf. It’s now a competitive difference for companies and it can determine the price and performance of a system. Hence, an accurate model that represents the actual activity and parasitic profile of the chip is important off which you can base package and PCB decisions.”

Packaging has other issues, though. While a chip consumes current, the package and the PCB can act as an antenna for chip-generated noise, which results in EMI. It’s becoming necessary to extract an S-parameter model (scattered parameter) model for the package. Once that model is constructed, then a full system-level AC, DC, and time-domain analysis, then a full analysis can proceed using the power models of the chip, said Sarkar.

“Right now 40nm is mainstream,” he said. “At 22nm and 28nm electromigration gets very complicated. Since electromigration and leakage current can change very drastically with a temperature increase, we have to model the thermal profile of the chip–especially for a stacked die configuration.”

But there’s also a point where models can become useless. Looking at everything from a very high abstraction level is excellent for layout and functionality, but it can insert some very large errors into power models—sometimes as high as 300%, according to Cadence’s Avinun.

And there needs to be more consistency among models to make them useful. Frank Schirrmeister, director of product marketing for system level solutions at Synopsys, said the standards don’t yet exist because this is all so new.

“In TSMC’s reference flow 11, they characterize their libraries for low power and then make this all accessible for TLM 2.0 modeling,” Schirrmeister said. “Then you should be able to add up meaningful power numbers, even at the system level. Today this is all in the early stages. The different vendors have different formats. At some point it needs standardization.”

Standards needed
All of the major foundries are working on these kinds of models. In addition, Apache is working with the GSA on models for power. Those will become particularly useful in stacked die configurations, where thermal issues are not always intuitive. (See fig. 2)

Fig. 2. Source: Apache Design Solutions

None of this will get solved quickly. For one thing, power models generated by memory makers may be different than those generated by foundries and IP vendors, which is where standards will become important. But the first step is creating a dialog and generating tools that can provide visibility inside and SoC, and so far at least that seems to be happening.

Next Page »