Posts Tagged ‘Common Platform’
By Pallab Chatterjee
As mobile platforms become a larger part of the component spectrum, their need for optimization beyond low power has moved to the forefront.
Traditionally, standard “line-cord” based products in both the consumer and commercial sectors have used the “G” label processes from semiconductor foundries. These processes had the highest-yielding combination of design rules, device performance and leakage as a tradeoff triad. The “G” processes were then further split into the “HP” and “LP” flows. The “HP” processes are high-performance optimized with the most aggressive design rules, lowest Vt, and support standard to higher operating voltages. The “LP” processes are optimized for low power and feature design rules targeted for the lowest leakage, support lower operative voltages, and tend to have the slowest transistors of the three options.
These process labels have been the industry norm from the 250nm era through the 40nm processes. At 28nm and below, a new process is emerging called the “HPL” or “HPM” process. UMC offers an HPL flow, which is a high performance and low power dual-corner optimized technology. At TSMC, the newly offered HPM flow is for high-performance mobile applications and is also optimized for high performance and low power.
The complexity of SoCs for mobile applications has driven them to use cutting-edge processes. The rise of computing visualization and content playback has forced these extended battery operation cycle products to embrace multicore architectures with embedded memory as the main design. To accommodate these activities, along with high-performance graphics handling, the designs have moved to single die SoCs, which minimize I/O as a method to reduce power.
These multicore designs also feature advanced power management based on switched power controls and a controlled state-based turn-on/turn-off of the power grid to different power blocks. Power-switch devices, with the ability to have very large devices to minimize the “on” resistance, are typically not optimized for high-performance processes. The new flows allow these devices to be built, along with high-performance processor and graphics cores, with significantly lower leakage than on HP flows.
TSMC announced the new flow earlier this month as a specialized optimization for battery operation, low-operating voltage, low leakage, and high-speed logic and memory access at the 28nm and 22nm nodes. The mobile platforms are driving enough of the wafer volumes to warrant a specialized flow rather than a “mix and match” from the other processes. The driver is not only smart phones, but also netbooks, tablets and other platforms that will consume graphic content. This content is spit between gaming applications and video/TV material. The video/TV material has the additional power optimization point of RF for the streaming connection to receive the content. The gaming content tends to reside locally on the platform.
This new process optimization also is driving new IP. The I/Os typically are migrating over from standard LP processes, as there is no major change to the external world. However, high performance IP is not applicable to the new flow. The basis of the new IP is power control and operation in a power envelope. From this constraint, the performance optimization is then imposed.
Companies such as Imagination Technologies, which feature soft IP, will not have any major issues with optimization to the new process offering. However, hard processor cores, cache memories, DSP’s, graphical user interfaces and display controllers will have to be redesigned. These blocks will need to incorporate the power-switching logic into their design, and support native multi-voltage blocks.
With UMC and TSMC offering these processes for foundry, and Intel and Samsung having them as internally use new processes, it won’t be long before GlobalFoundries and the Common Platform bring this new optimization point to market.
By Pallab Chatterjee
Process scaling has normally been performed on a lithographic basis, but as processes dip below 32nm there are optimization options beyond the lithographic and area reduction.
The Common Platform Group and GlobalFoundries have added the tradeoffs of power and performance optimization in addition to area in their 28nm flows. TSMC uses a five-way optimization that also has area, power and performance as three of the points.
The enabler of HKMG (high k/metal gate) is the process enhancement that allows power optimization to take place. The process nodes of 90nm, 65nm, and 4Xnm were plagued with device leakage issues. These leakage issues create a pause in the operating voltage scaling of the circuits at 1.2-1.3v. The use of HKMG, in either a GF (gate first) or GL (gate last) process flow, allows for a primary reduction of the device leakage. The corresponding benefit of the reduced leakage is the ability to vary the threshold voltage (Vt) of both the P and N transistors, in the case of GlobalFoundries over a range of 300mv at 28nm. Since the Vt can be adjusted without corrupting the basic device operation, the operating voltage can be reduced while delivering the same device switching characteristics.
Both the GF and GL versions of the HKMG process support a standard “G” style logic optimized flow, an “LP” low-power optimized flow and an “HP” high-performance optimized flow. As the reduced process size can be multi-point optimized, it is possible to create a 28nm process that provides 2x the performance (freq*density) at the same power factor as a 40nm node. Additionally, a low-power optimization supports a 28nm process at 1.1v, producing a 49% increase in operating frequency while having a simultaneous 44% reduction in switching power.
The ability to now scale the Vt of the small devices and reduce the power supply gives new design flexibility at the 2Xnm nodes. The 28nm and 22nm flows can support multiple Vts in a single design and, correspondingly, different power supply levels in isolated islands. This level of control over the devices is reminiscent of the capabilities last seen in the 250nm+ nodes. Designers now can perform IP and cell level optimization for power and area based on design rule adjustments and device selection. For IP, differentiation in the rules can optimize SRAM, logic and analog/RF all with different Vts and operating voltages. This makes the block and SoC-level design power and performance optimized only if the design flow supports technology based optimization.
Most design flows for low power rely on gate power switching and a single Vt selection per block The 28nm flows have context sensitivity for all the physical design components. With the use of computational lithography solutions, and the symmetry requirements of double patterning, multiple sets of design rules are used to drive the function optimization. The support environment allows for multiple SRAM types (operating voltages, Vt, density, speed, etc.) to be built and dropped into blocks that may share a common power-gating control. In the 40nm to 90nm designs, these variations could not be combined.
The very small cell size and device pitch in 28nm and 22nm processes has created a need for new interconnect solutions. These new interconnect methods are needed to ensure the low power operation of the designs, as the interconnect is a major consumer of the device power. Most of the designs in these nodes utilize bump and in-die pads. At the 250nm through 45nm nodes, the bond pad size for bump bonds are still the traditional 110um size on a 150um pitch. At the 28nm and 22nm nodes, the bump bonds can be reduced to a 60um size on a 100um pitch. This reduction in size creates a minimized interconnect path for both power and signals, and drive the area based optimization aspects of the process.
Low power design in the 2Xnm nodes is now a full design/technology/lithography co-optimization task. The design workflows have to address both device characteristics and lithography variability to ensure any power factor design goals.
By Pallab Chatterjee
Recently Samsung gave an update on the status and availability of its advanced 32/28nm process technology for use in foundry. The process is targeted for shipping designs to customers at the end of this year, with a road map that continues through the 22/20nm nodes and down to 15nm.
What was particularly interesting were several key innovations that have made this all possible, as well as the company’s statement that the real driver is reduced power.
The new processes, co-developed with IBM, follow the large commercial success of the Intel achievement of using a Hafnium “Hi-K” metal gate process. Although this terminology has been around for a few years and is the dominant technology in the microprocessor marketplace, there has been some “uncertainty” in the design community about what it actually buys the designer. The Hi-K gate technology is a process development that directly addresses the leakage current problem that arose in CMOS technology at the 90nm node and has persisted through the 45nm node. The scaling on process technology using Moore’s law is a three-axis scaling—x and y for the length and width of the transistor used to make the basic devices, and also z or the vertical dimension. Z is the thickness of the gate dielectric, which controls the intrinsic speed and performance of the device by setting the difference between “on” and “off.”
Since the late 1960′s the scaling of all three axes has taken place concurrently—until the 90nm node, that is. At 90nm the complexities of lithographic processing, planarization, materials used for interconnect, isolation between devices and reduction in application power supply were moved up from third- to fourth-order issues to become the dominant drivers. This made the leakage current and capacitance issues with the z-direction scaling the secondary challenge. This focus on the other processing issues caused the gate scaling to stall, and not continue proportionately with the x and y scaling, resulting in leakage, multi-power islands, high electric fields, and high-stress devices and designs that have dominated the past few years.
The lithography solution is staying optical with multiple patterning solutions through the 22/20nm node. The planarization, interconnect and device stacking for “multi-die” technologies are progressing to address the function vs. density vs. space requirements going forward, which allowed time to develop the new materials needed to make the gate dielectric (replacement of standard SiO2 with an Hf based material) and re-start the z-dimension scaling. At the 32/28nm node, the reduced leakage and increased device performance (difference between “on” and “off” states) brings a new level of design capability.
Results using the process in foundry-type circuits (embedded processors with memory, custom logic, and standard commercial interface connectivity) are showing as much as a 35% power reduction for the same operation specification as existing circuits. This power reduction comes from both the ability to drop the operating supply voltage for the same performance specification and from an overall reduction in leakage/standby state power for “idle” modes in a design.
The new process technology, now starting to become available from multiple suppliers, does bring an opportunity to create a new generation of mobile appliances. There is a significant challenge to the design community to address these benefits as a mainstream technology solution. The cost of entry into the design game at these nodes is very high. A typical 32/28nm SoC is probably going to contain more than 500 million devices, including embedded memory, and will likely have a very high pin count. This will require a big design team to architect, design, assemble, and test, not counting the very aggressive 20-plus man-years of IC design (5M devices/man year for the flow X 20 people = 100M devices + 400M in third party embedded memory), and application software development.
These design costs are on top of the fab costs, which are targeted at more than $4M for masks, plus the wafer fab, package and test. And it is looking like the big boys at the $30 million-minimum per design are the only ones who will be left at the table for real “low power” process game.
The race to 40nm is over. Some chipmakers are already there, taping out designs and implementing IP that has already been qualified at the 40nm process.
When exactly volume production begins and when yields improve is a matter of conjecture. TSMC so far is the only major foundry actively using the 40nm process, which is a half-node beyond 45nm. But the Common Platform already has briefed analysts and customers on its 40nm process, even though most of its work is at 45nm, and the Global Foundry—the AMD spinoff—has 40nm ready to go if there is customer demand.
A side benefit to consumers—and a big headache for design engineers—is that the power envelope continues to shrink with the line-widths. Low power is now standard in every design, which puts pressure on all IP vendors to create low-power versions at least concurrently with their newly qualified IP, if not first—or to make all versions low power. In the past, low-power versions typically trailed initial rollouts by 6 to 18 months.
And while that doesn’t mean all pieces of an SoC design need to be manufactured using a 40nm process—non-volatile memory, for example, is still at least a node behind—it does mean that research is well underway and on track for 32/28nm and that 40nm appears to be a relatively stable manufacturing process.
AMD, with its ATI line, and Nvidia both have 40nm versions of their latest graphics processors, which typically run at the leading edge of Moore’s Law because there is far greater potential for using more cores with existing software than many other chips. Video, in particular, is one of the easier applications to write for multiple cores because graphics rendering can be parsed into discrete units.
Low power everywhere
The power envelope in a more densely-packed piece of silicon has to be significantly lower, however. Signal integrity is a growing problem, according to design engineers, in part because of the density and the amount of current moving through the wires. Higher density also opens up real estate on a single chip for more functions that previously were on multiple chips or even multiple devices.
All of that points to lowering power wherever possible. And it means that to be successful in the market, low power design is a must. Virage Logic, which makes a variety of memory and logic IP, saw the trend clearly at 65nm when it incorporated low-power options into all of its IP instead of offering a separate low-power version.
“At 40 nanometers, if you want to create a new chip it has to be low power,” said Brani Buric, Virage’s executive vice president of marketing and sales. “We used to have high-density, high-speed and low-power versions of our IP. At 40nm, there are no separate low power products. There is a full set of low power features in both our high-density and high-speed IP, whether that’s memories or logic.”
AMD’s graphics processor group rolled out its first product at 40nm this spring. Stan Ossias, director of product management in AMD’s global/discrete graphics unit, said the bulk of the company’s work is still at 55nm and the company got a huge performance gain by re-architecting its 55nm chips.
“A lot of what we do has to do with predicting the readiness of the process at any time,” said Ossias. “We capitalize on the IP that’s available and the design he have to maximize our competitiveness. Last year, we had the choice of going to 40nm using the same architecture, but we thought we could do a better job of reaching our performance goals by redesigning the architecture. We didn’t feel the 40nm process was ready.”
That approach is one that is becoming more common among companies that typically hopped from one process node to the next in the past. The complexity of getting to the next node, along with the rising costs and uncertainties about manufacturability, yield and the IP needed in a design—not all IP available at 40nm has been proven in silicon yet—makes each new process node an increasing risk, and one that is no longer just an automatic decision.
At least part of the risk assessment also has to do with power consumption. Each new node also requires reducing the power consumption, which involves a litany of design tricks ranging from power gating for active power to utilizing power islands for static leakage, different gate structures and a variety of exotic insulation materials.
“Power is one of the fundamental areas we think about with technology evolution,” said Ossias. “Every time we shrink the process, we have to put more and more effort into decreasing power. That involves not just the individual device, but how that device interoperates with other devices. It’s a big consideration.”
40 vs. 45nm
Even moving from 45nm to 40nm is raising some questions. The foundry business is extremely competitive and having the next process used to be a competitive advantage, but so far only TSMC is actively pushing 40nm. The foundry told analysts that it opted for 40nm instead of 45nm because the process could be tuned better for device performance.
Joanne Itow, managing director of manufacturing at Semico Research, said the number of half nodes is exploding. She said that gives both foundries and companies a chance to firm up the processes and move more gradually to the next full node. The Common Platform, for example, is working on 28nm, which is the half node between 32nm and 22nm.
Global Foundries, which is the AMD spinoff, will work with customers for a specific implementation at 40nm or refine its bulk 45nm process, according to spokesman Jon Carvill. But he said the next step under development is a 32nm and 28nm bulk CMOS process.
Still, now that the foundries have reached the node and are working on the next one, the question remains of just how many chipmakers will move to the next half node and how quickly. There is a lot of conjecture now that the pieces are falling into place for 40nm production, but so far there are no definitive answers.