By Barry Pangrle
Power budgets and the characteristics of the underlying process technologies have limited the clock speeds of the processors often found in large compute farms for simulation over the past six years, but the designs under test have followed Moore’s Law and have kept growing larger at an exponential rate.
Processor designers have added more cores per chip to increase the performance per chip, but that only really helps in environments where there are multiple jobs to run or the jobs are easily broken down into parallel tasks that are distributed across multiple cores. This is fine for a number of unit tests and shorter simulation runs that can all run in parallel on multiple processors, but it has created a large gap for the bigger and longer runs that just don’t distribute well across multiple cores. Examples are booting an operating system on a large multi-core design, running multiple frames of complex image processing or analyzing streams of packets of network traffic for a switching or routing design. With processors running at gigahertz frequencies, each second of real time requires a billion or more simulated cycles. In order to address this widening gap between simulation requirements and simulation capabilities, more teams are looking toward emulation to bring runtimes back into timeframes that will allow products to reach their markets in a timely manner.
A big factor in power efficiency is creating a system architecture that’s best suited for an expected application and workload. Emulators are clearly more tailored to high-speed verification for ASIC designs than general-purpose compute servers, and this clearly shows in the results.
I’m going to walk through a fairly conservative back-of-the-envelope calculation for a performance per watt comparison. As a starting point, I’ll use some numbers published in Microsoft’s Datacenter Efficiency: Executive Strategy Brief. Based on this, we’ll assume that per fast quad-core CPU our server farm will use 270W of power under load. This allocates 120W for the processor and another 150W for RAM, disks for swap and storage, power supply units, and ancillary chips for network communication, etc., for a total of 270W per quad core CPU. If two of these fast processors are placed on a blade we’re then looking at 540W for the dual-socket blade. On the performance side of the equation, we’ll assume that one quad core CPU is good for 200 cycles per second of simulation. Based on these values we arrive at 0.37 cycles per second per watt (where performance is measured in simulation cycles per second).
So, how does this compare with emulation? We’ll assume that the emulator uses 10kW. Today’s emulators draw somewhere in the range of about 1.5kW up to over 30kW, but very high performance emulators are available that fall comfortably under that 10kW figure, so that’s what we’ll use. In order to interface with the emulator, we’ll also throw in eight quad-core processors for another 1.08 kW, bringing us to a total of 11.08 kW. As an additional margin, we’ll round it up and call it an even 12kW. On the performance side of the equation, we’re going to conservatively use 500k cycles per second. Based on these values then we arrive at 41 cycles per second per watt. That’s a bit over 2 orders of magnitude or 112x better than the server farm!
What does this mean in terms of power savings? In order to achieve the same cycles per second capability (and this is of course spread across multiple jobs on the server farm), we would need 1,250 of our quad-core CPUs. If we were to fully utilize the emulator and server farm over a year, we would end up using 5.9 GWhrs for the server farm and 115MWhrs for the emulator and associated group of eight quad-core CPUs. That’s a whopping difference of 5.785 GWhrs per year. Assuming a cost of $0.12/kWhr for electricity, the difference in the power to run the machines leads to a $695,000 savings per year for the emulator solution.
If that figure sounds high, take into account that Microsoft says that their datacenters run with a PUE* in the range of 1.2 to 1.5. Most companies with their own machine rooms don’t operate at the same efficiencies as these state-of-the-art datacenters. If we conservatively use the value 1.5 from the upper end of Microsoft’s super-efficient datacenters, then the total power cost savings is over $1M per year.
Suddenly, emulation isn’t only looking powerful; it’s also looking pretty green.
*PUE: Power Usage Effectiveness is the ratio of the total power used in the datacenter for servers and cooling, power back up and other administrative, to the power used for the servers. Microsoft claims that most data centers operate with a PUE of about 2.0.
–Barry Pangrle is a solutions architect for low-power design and verification at Mentor Graphics.
Tags: Mentor Graphics