Posts Tagged ‘HP’

Power Bits: Driving A Harvester, Redesigning The Data Center

Thursday, November 3rd, 2011

By Ed Sperling

Highway harvesters
Hybrid cars may be taking on a new dimension. Rather than just running on batteries and gas, they could also generate their own energy.

This has happened already to a small degree with regenerative braking, which puts energy back into the car’s battery. But energy scavenging has come a long way since that concept was first developed.

The next challenge is to go well beyond just powering the infotainment in a car and actually use it to power the motors. IDTechEx contends that future vehicles may have energy harvesting spread throughout them, from flexible photovoltaic cells that wrap around the car to energy-harvesting shock absorbers.

This creates a new category of car—one that generates as well as uses energy. But the key is that it can use much less energy, go significantly farther on a single charge or tankful of gas, and probably be accomplished for a very low additional cost.

Cooler data centers
There have been two dueling problems inside of data centers for the past decade. One is heat, the other is energy. Both are related. The more servers, the more heat, and the more money it costs to power those servers and to cool them.

HP’s announcement that it is building extreme low-energy servers based on ARM’s, which it calls Project Moonshot, takes an interesting approach to this problem. Rather than thinking of servers as individual machines, it also allows resources to be shared throughout a data center. HP estimates energy can be cut by up to 89% using 94% less space.

Those are interesting numbers, and they speak volumes about the interest in energy costs rather than just performance. Energy costs are the main reason why Facebook just signed a deal to build a data center in northern Sweden, where outside air can be used to cool racks of servers 10 months of the year without turning on the chillers. They’re also the reason why data centers are being built along the Columbia River Gorge in Oregon, and in Arizona, which has a surfeit of energy produced by nuclear reactors.

This doesn’t end the war between Intel and ARM in this space, however. HP is planning to include Atom-based processors from Intel, as well, in the future. But the real key is that after nearly 60 years of focusing on performance in each subsequent server release, the central theme is now all about power.

The Tao Of Software

Thursday, June 16th, 2011

By Ed Sperling and Pallab Chatterjee
As software teams continue to race past hardware teams in numbers of engineers, hours spent on designs and NRE budgets, companies are beginning to question whether there needs to be a fundamental shift in priorities and strategy.

The problem is that it takes far too long to write and debug the software and to get it working on the hardware, even with virtual prototyping capabilities.

“Bare metal software is the hard part of the problem,” said John Bruggeman, Cadence’s chief marketing officer. “It’s the bane of the embedded system company—80% of the time is spent getting bare metal software to run on hardware. It takes two to three months to get Linux to boot because there is no visibility into the software and the hardware simultaneously.”

That challenge becomes increasingly more difficult at each new process node, as well, because complexity is increasing on both sides. Bruggeman said there are three reasons solutions haven’t worked so far. One is that every solution to date has been closed or proprietary, which limits the number of programmers working on a solution. The second is that solutions today are fragmented, both by multiple vendor tools as well as some of the flows by single vendors. And third, the complex multi-geographic development coupled with enormous scale and size has not resulted in a coherent solution.

Cadence clearly isn’t alone in recognizing the growing problem in software, although it is the most vocal of the Big Three EDA vendors. All have major software efforts under way and have made significant investments in these areas. Mentor Graphics has a big push in Embedded Software and Synopsys has an equivalent focus on software prototyping. All have made acquisitions in their respective areas, as well.

But getting software to run more efficiently on the hardware is a different sort of problem. It’s understanding how the two interact at a very deep level.

Glenn Perry, general manager of Mentor’s Embedded Systems Division, recounts a story of one customer that was porting Linux to a chip and couldn’t figure out why the operating system was continually burning up energy. The culprit, as it turned out, was a blinking cursor.

“The goal is to put power in front of software,” said Perry. “When we do that with a regular optimization of Linux we see a 70% to 90% improvement in power. We need to fix the simple stuff first, and this isn’t so easy. What we’ve found is that embedded developers know very little about software.”

Power games
But if hardware engineers know little about software, the reverse is also true. One of the biggest demands for improving the efficiency of software comes from the gaming world, where software typically has been written in a high-level language with little or no attention to power consumption. In gaming, the user focus always has been on performance—both in speed and in resolution—rather than power. But as more games are being downloaded onto mobile devices, that perception has changed dramatically. No matter how good the game, if it drains the battery in 20 minutes no one will buy it.

The result is that power controls need to be specified in the code, which is difficult considering the growing demands on these systems. Most online gaming is done at 720p resolution due to bandwidth limitations, with a typical compression of 1 I frame for every 200 P frames as part of the H.264 codec.

Mobile platforms typically code in OpenGL while 3D games use OpenCL. These games use a shader, 3D render, and main graphics display engine for the iPhone, iPad, Samsung phones and tablets, LG phones, Motorola and Droid phones, Asus tablets and the Motorola Xoom. Several mobile gaming companies (France, Itally, Finland, Sweden) are now developing products for Q4 release using OpenCL for the Imagination Technology PowerVR core.

The challenges are growing from there, as well. Several major software companies, to provide a higher quality visual experience, also have written a new codec for use with the Xbox360 and PS3 platforms. These new codecs handle a different raster and render routine that supports both physics-based graphics generation (fire, rain, water, snow, wind, explosions, and striking reactions from swords/sticks/knives) and secondary scan for background details (flowers on trees, multi-color grass, flowers and moss on the ground, details on reeds, etc.) in addition to the normal patterns. The new codec was needed to be able to send and render the data in the standard data stream size.

Which comes first
So how much is all of this really going to affect design? Despite predictions that software engineering teams would displace hardware teams, the reality is that both will be forced to co-exist. They will never actually speak the same language or work on the same exact project, but the push is to improve communication back and forth between them. Software needs to become far more power-aware, and hardware needs to become more efficient at running software.

The last time the design world dealt with an issue like this was when the battle over RISC vs. CISC—reduced instruction set computing vs. complex instruction set computing—was being waged. That was in the 1990s, when Unix first posed a commercial challenge to operating systems from companies such as IBM, Hewlett-Packard, Digital Equipment Corp. and a handful of others that made their own OSes back then.

But power is forcing these issues back on the table once again, driven initially by the mobile sector and increasingly by devices with a plug. The likelihood is that it will never be a perfect marriage, but it is one that is likely to last this time because both teams need to at least have the same goal—even if they don’t talk the same language.

The Week In Review: Sept. 3

Friday, September 3rd, 2010

By Ed Sperling
Andes Technology, a Taiwanese maker of SoCs and processor IP, adopted Cadence’s digital front-end low-power design flow, which is based on the Common Power Format. Score one for CPF.

Toshiba Information Systems expanded its adoption and deployment of Mentor’s Catapult C for high-level synthesis. What’s interesting about this deal is Toshiba’s shift to untimed C++ and System C from an RTL-based design, which the company says is unproductive.

Synopsys completed its acquisition of Virage Logic, greatly expanding Synopsys’ IP portfolio from standard interfaces to everything from memory to processor cores and logic libraries.

GlobalFoundries held its first conference, which was largely a coming-out party for the combined company that includes pieces from AMD and Chartered Semiconductor. There were several themes of note. First, GlobalFoundries is gate-first high k/metal gate, while TSMC and Intel use gate last. For most customers, that means it becomes even harder to move from one foundry to another. Second, GlobalFoundries sees this as a strong ecosystem play. It has lined up ARM—which is playing across both foundries—as well as Freescale for its 90nm flash memory.
And finally, the company introduced a 28nm analog-mixed signal flow development kit.

In the memory space, memory resistors, aka memristors, are gaining attention. The technology is considered faster than flash while also drawing low power. Solid state memory already lowers the amount of power being used because there are no moving parts, but memristor reportedly uses as little as one tenth the power of flash. HP is teaming up with Hynix to develop what it calls ReRAM, which should put a big dent into the DRAM market.

Greener Data Centers

Thursday, December 10th, 2009

By Ed Sperling

For decades the race inside the data center was all about performance. If you upgraded from an IBM Series/370 mainframe to a Series/380 your applications ran faster. And if you upgraded your PC server from a Pentium II to a Pentium 4 you got significantly better performance.

The race now is to reduce the number of servers altogether, to lower the cooling costs per server rack, and to utilize the servers that are running more effectively. Performance is a “nice to have,” but power reduction is a “must have.”

What’s changed in the thinking of data centers and why are server-class electronics now being subject to the same kinds of power-saving concerns as portable battery devices? There are a number of factors to consider, and all of them are converging at the same point.

A messy legacy

To understand the problem inside data centers requires some history—as much as six decades worth in many large companies. Data centers in many ways look like geological striations. While new technology runs many of the most advanced applications, there are still old, assembly-code mainframes and even minicomputers still churning cycles each day. In many cases no one knows what’s even running on those computers. But at the risk that it could be important—or worse, that something else might be affected that is known to be important—the fear of turning off these machines is palpable.

2423PH2044

Figure 1: IBM’s S/360, circa 1964 (Source: IBM)

Large corporations have been systematically looking through the data on these machines and others over the past several years in an effort to get this old stuff out of the data center. It takes up expensive real estate, uses an enormous amount of power—no one even thought about power as an issue when these machines were installed—and requires expensive cooling because the average data center runs at about 70 to 72 degrees Farenheit. The only good news was that early mainframes used water for cooling instead of air, which was much more energy-efficient.

Minicomputers entered the mix in the 1980s as a less-expensive but air-cooled approach. Those computers are still in use in many companies alongside mainframes that pre-date them. Ken Olsen, the founder and CEO of the former Digital Equipment Corp. (bought by Compaq and later absorbed by HP) is famous for saying that in minicomputers there would be no plumbers. While that made it easy to move around the machines, it also paved the way for more expensive cooling since then.

800px-Pdp7-oslo-2005

Figure 2: DEC PDP-7 (Source: Wikipedia)

By the 1990s, commodity servers using primarily Intel processors began replacing mainframes. Even IBM and Hewlett-Packard began selling Intel-based machines, usually in the form of blades that could be placed more closely together in a rack. And they were so cheap that business units could afford to use dedicated servers for their individual applications, create their own customized processes and finally put decisionmaking closer to the customer.

That was the argument, at least, and it was considered the best practice at the time. After 20 years, however, some companies accumulated hundreds of thousands of these servers, often running only one application with utilization rates as low as 5%. And because they were air-cooled, often with raised floor construction that cooled from the bottom instead of the top—heat rises, of course—the cooled air had to be run almost constantly and often ineffectively.

Virtualization and clouds

Virtualization has been touted by Intel over the past half-decade as the ultimate solution to server sprawl. Rather than run one application per machine, many applications could be run using virtual machines. While the concept was new for PC servers, the technology was invented by IBM back in the 1960s and employed in mainframes for decades.

Virtualization also works particularly well with multicore chips. And because it’s impossible to keep cranking up the clock frequency on processors without melting the chip, it’s now a requirement that all new chips have multiple cores. But only database, graphics, some scientific applications and some EDA tools have effectively been able to parse functions across multiple cores. The vast majority can use a maximum of two cores effectively, which creates a business issue for chipmakers. If they can’t figure out a way to use all those cores, there’s no reason to buy new chips.

Virtualization was resurrected as the ultimate solution for that problem. By adding hypervisors to manage the applications running on a single core, and by dynamically scheduling those applications to run on available cores instead of dedicating cores to applications, a system can conserve huge amounts of energy. Old mainframes used this approach primarily to utilize compute resources, but power consumption is the new competitive weapon.

Cloud computing—which is basically used to clean up data centers, often with a virtualized approach to running applications—is another term that has been overhyped in the data center. It generally means outsourcing, although in many companies at least part of the cloud is inside their data center and dedicated for their operation. That turns the IT department into a business unit that can create its own profit-and-loss center and keep track of the overall costs.

Intel’s latest research, which is expected to start showing up in servers made by other companies over the next several years, is to build a cloud on a single chip. (See Figure 3) By adding enough cores—48 is the current number tested by Intel—there is no reason to ever go off the chip. Intel believes the total server power consumption at that point could be measured in less than 125 watts when fully utilized.

scc-h-rack

Figure 3: Intel’s prototype for a cloud on a chip. (Source: Intel)

What this does, in effect, is bring the resources used in a computer down to the chip level instead of between machines. At that point, the challenge of getting computers to talk to each other and to shift resources will be significantly confined and power consumption will become a much more localized problem.

To some extent, this is no different than what has been happening in smart phones. When cores are not in use they go into various sleep modes. It doesn’t matter, for example, if a game takes a couple seconds to boot, while it is essential that the phone function be always on and ready to work.

The same type of control can be applied to data centers. A search of old data, for example, can stand a wait of several seconds, while a transaction from a customer must be instantaneous. Running a payroll application likewise can stand behind a more critical function in a data center, such as blocking a possible security breach.

This type of scheduling on a single machine, let alone clusters of machines, is a new concept, however. In the mainframe and minicomputer days, all resources were managed locally. In the PC world, particularly those connected to the Internet, management can be centralized for a global corporation. But in the new model, it also can be centralized on a single machine once again with enough processing power and low enough power requirements to significantly cut costs while also maintaining at least the same performance—even if applications cannot utilize multiple cores.

At that point, it may be more a matter of scheduling priority—and in some cases, paying for that priority access even within a company—than how fast the machines are running. After decades of arguing for centralized control as the most efficient way of using resources, many data center managers are finding it’s also the most efficient way to use power.

Does that sound familiar?

Virage Logic Buys ARC

Tuesday, August 18th, 2009

By Ed Sperling

Aug. 18, 2009–In yet another sign that the big are getting bigger in the semiconductor IP world, Virage Logic today announced its intention to buy ARC International.

The acquisition, which will be an all-cash deal valued at roughly $41 million, puts Virage in an interesting position. While ARC makes processor IP, most of that is targeted at areas such as storage, audio and video, which is where the company’s recent wins have been. Rather than engaging in head-to-head competition with the biggest battle in the IP world today—ARM vs. Intel—the acquisition of ARC allows Virage to partner with both companies.

ARM and Intel have been on a collision course in the low-power space ever since Intel introduced its Atom chip. ARM is pushing up from the mobile handset market into netbooks, while Intel is pushing down from the desktop into the same market. The first ARM-powered netbooks have just begun hitting the market, while Intel is working hard to cut the power consumption on Atom chips to push further down into markets dominated by ARM.

“This allows us to co-exist with both companies,” said Alex Shubat, Virage’s president and CEO. “It also allows us to attack the market where there is huge growth and huge number of units are shipped.”

Also working in Virage’s favor are deep relationships with foundries such as TSMC and the Common Platform. To survive in the IP market requires a strong ecosystem, particularly where there is competition. ARM has built an enormous ecosystem for its processor IP, which has been its real strength in the mobile phone market. MIPS has been revamping its own to compete in the Android phone market. And Intel is drawing on its own relationships with applications developers.

While many of these relationships are not exclusive, market wins tend to coincide with the strength of ecosystems. IBM’s success over the past five years, for example, is largely the result of a growing ecosystem at all levels, ranging from early stage research to IP and joint development. ARC has customer wins with a slew of companies, including Intel, HP, Broadcom, Sandisk, Infineon and Sony.

Shubat said the deal will go through by the end of the year, and possibly sooner. He noted that there is “zero product overlap,” which should speed the integration of the two companies.

“This is definitely about ecosystems and consolidation,” he noted. “A few strong players will survive.”

New Low-Power Memory Technology Under Development

Wednesday, June 10th, 2009

By Pallab Chatterjee

Unity Semiconductor, which was formed in 2002 and has been in stealth mode until May of 2009, is progressing on the development of a very dense and low power non-volatile solid state memory technology.

Unlike traditional semiconductor memory, which uses an active device and electron transport as the primary storage element, the Unity Semiconductor CMOx technology uses a new ionic oxide element for the storage node. The technology appears to use a similar hysteresis waveform performance as the recently identified HP Memristor, but it uses a different energy profile in write and store and a 1 vs. a 0 along with a non-linear I-V curve. Although the wave shape has been observed in semiconductors since the 1950′s and theorized since the 1970′s, the device has not had intentional implementations and circuits until the sub-180nm era of processing.

Based on discussions with Christophe Chevallier, Unity’s vice president of design engineering, the solution at Unity is based on a traditional CMOS logic process to create the base logic and addressing/ECC control for the memory (currently in 130nm and moving to 90nm for production), and then a special memory material back-end of line (BEOL) process (currently on 130nm and moving to 45nm/35nm for production).

The base wafers are being built by TSMC and with a major Japanese ASIC supplier. The BEOL flow is local to the United States. The process and device characterization was developed by Unity. The use of multiple process nodes allows for optimization of performance, cost and power (operating and leakage) for the design while minimizing the facility investment.

The core memory is a cross-point array architecture and is based on optimal use of the transistor-less vertical memory element. This is shown in the diagram below, which details the sections of the memory element. These memory cells, due to their programming method and device operation, are no longer limited to being planar pitched devices. They can be stacked vertically.

cmox-graphic-1

At this time Unity’s characterization chip (a 64MB design) is using a single-level cell (see device cross section below, including the visible layer stratification for the memory elements), but the company has conducted other test designs showing the technology is stackable in the BEOL processing up to 8 layers of memory element. The initial product will use a four-layer stack. The core cell figures of merit and anticipated larger production die are based on this four-layer stack.

0_5f2-cell-graphic-21

The photo below shows the configuration for the four-layer stack and is the method that is targeted for Unity’s xTB-sized chips.

cmox-cells-and-cmos-3

Since the BEOL processing does not require any high temperature flows (anything over 400C), the native logic device operation is unaffected, which allows for optimization of the design to low leakage and very low standby currents. The current designs and their associated internal IP have all been created using standard Cadence analog and custom design tools and industry standard simulators.

The use of the stacked memory and vertical conduction/programming path for the very small data store element minimizes the interconnect RC and associated parasitic and thus the size of the drivers needed. The memory element has a fairly large (proportional to other memory technologies in the same size form factor) detectable signal, which lets the sense circuitry operate at lower operating power compared to other similarly sized cores.

The optimization of density and power has targeted the CMOx memory products for SSD class storage. While it will be compatible with traditional NAND Flash applications, its higher density and power will direct the product into the high-capacity applications. Both netbook/notebook and enterprise class SSD devices are being targeted.

The products will use traditional DDR memory interfaces. The new technology, by virtue of supporting the cross-point memory architecture, allows for different memory addressing options and error corrections method including byte-wide and page-at-a-time write capabilities, and correction levels from a single cell being skipped to dropping an entire plane of the memory elements.

Unity said its technology is protected by many patents (more than 50, so far, with more in process) and seems to be working with the split manufacturing facility method for the current time. If successful, along with HP’s entrance into the memristor memory market (similar density numbers), terabyte-level monolithic memory should be cheap and available to all who need it.