China has unveiled the world’s fastest supercomputer, the Tianhe-1A, at a high-performance computing conference in Beijing.
The Tianhe-1A has a Linpack benchmark performance of 2.507 petaflops, according to Nvidia, whose Tesla M2050 graphics processing units (GPUs) were used in the supercomputer.
Linpack is a software library for performing numerical linear algebra on computers written to test supercomputer speeds in the 1970s.
The Oak Ridge National Laboratory claimed the record in November 2009 with its Cray XT supercomputer, nicknamed “Jaguar,” which hit a calculation speed of 2.66 petaflops. That outpaces the Tianhe’s performance; however, 2.66 petaflops is a measurement of the Jaguar’s peak performance, while the Tianhe-1A can run at a sustained performance of 2.507 petaflops.
Mooning Over the Tianhe-1A’s Muscles
The Tianhe-1A, at China’s national supercomputer center in the city of Tianjin, uses 7,168 Nvidia Tesla M2050 GPUs and 14,336 CPUs, Nvidia said. These deliver the same power as 50,000 CPUs and require half as much floor space, according to Nvidia.
Further, the Tianhe-1A consumes about one third as much power as an all-CPU system would — 4.04MW instead of more than 12MW, Nvidia said.
The Tianhe-1A has a Linpack benchmark performance of 2.507 petaflops, according to Nvidia. One petaflop is a thousand trillion instructions per second.
Designed by China’s National University of Defense Technology, the Tianhe-1A is fully operational now. The Tianhe-1A will be operated as an open access system for use in large-scale scientific computations.
“China’s trying to become a global leader, not just in technology but also in science and discovery and research, to tackle all the biggest problems we face in the world today,” Nvida spokesperson Andrew Humber remarked. “How do we find oil in a safer, easier way so we don’t end up killing the oceans; solving problems in materials science; pharmaceutical research,” he added.
“Supercomputing is where most of the cutting-edge work is done — aeronautical research, automotive research, there’s hardly a single industry where work isn’t conducted with supercomputers,” Rob Enderle, principal analyst at the Enderle Group, told TechNewsWorld. “Supercomputer performance defines the leading technology in the world.”
The Nvidia Tesla M2050’s Guts
The Tesla M2050 is a massively parallel GPU that has a double precision floating point performance rated at 515 gigaflops; single precision floating point performance rated at 1.03 teraflops; and 3 GB of GDDR5 dedicated memory.
Maximum power consumption is rated at 225 Watts.
The Tesla M2050 uses Nvidia’s next-generation CUDA architecture, code-named “Fermi,” which was announced in October of 2009.
Fermi was designed from the ground up for general-purpose computation with features such as error correction code, support for C++, a true cache hierarchy and concurrent kernel execution. It has more than 3 billion transistors and up to 512 CUDA cores. The CUDA cores feature the IEEE 754-2008 floating-point standard, which is derived from, and replaces, IEEE 754-1985 and also includes IEEE 854-1987, the IEEE Standard for Radix-Independent Floating-Point Arithmetic.
CUDA, or Compute Unified Device Architecture, is a parallel computing architecture developed by Nvidia. Using CUDA, Nvidia GPUs can be used for computation like central processing units (CPUs) are. However, the parallel throughput architecture of GPUs executes many threads concurrently.
If GPUs are so advanced, why did the Tienhe-1A need CPUs as well?
“GPUs are hard to program,” explained Carl Howe, director of anywhere consumer research at the Yankee Group. “For data sets that need more than vector processing, CPUs would come into play. I think they wanted to be able to run anything that comes their way,” he added.
What Happened to Oak Ridge?
The Oak Ridge National Laboratory, which is the U.S. Department of Energy’s largest science and energy laboratory, had what it claimed earlier this month was the world’s fastest supercomputer — a Cray XT nicknamed “Jaguar.”
This was rated at 2.66 petaflops. The Tianhe-1A has a Linpack benchmark performance of 2.507 petaflops, according to Nvidia.
How could Nvidia claim the Tianhe-1A is faster?
That’s because the rating for the Jaguar is its peak performance number, Nvidia’s Humber explained. “The Linpack number is 1.7 for the Jaguar and 2.5 for the Tianhe-1A,” he pointed out.
“The Linpack is the metric for how fast a machine can sustain performance on applications,” Jeff Nichols, an associate laboratory director at Oak Ridge National Labs, told TechNewsWorld.
“We have the number one position right now as the highest sustained performance on Linpack,” Nichols said. “The expectation is that China has built a machine based on Intel x86 processors and Nvidia GPU processors that has a higher sustained performance on Linpack and it’s expected that they’ll claim the top spot next month at SC 2010.”
SC 2010 is the premier supercomputing conference, to be held next month in New Orleans. An offshoot is held every June in Germany.
China has launched its own version, at which the Tianhe-1A was announced, Nvidia’s Humber said.
The Oak Ridge Lab is working on an Nvidia Fermi-based supercomputer right now. It expects to unveil that supercomputer, to be built by Cray, between 2011 and 2012, Nichols said.