The contract, which Cray pegs as worth US$97 million, will cover a multi-phase process scheduled for completion in 2013.
Cray will essentially rip out and replace Jaguar’s existing processors with the latest AMD Opteron CPUs and Tesla graphics processing units (GPUs) from Nvidia.
The resulting behemoth, which will be named “Titan,” will have almost 300,000 cores and 600 terrabytes of memory. It’ll be used for pioneering simulation projects in various areas of science.
“For us, it’s about meeting the needs of the scientists who are demanding more computing power,” Oak Ridge spokesperson Dawn Levy told TechNewsWorld.
“There are certain applications such as nuclear fusion research and climate science where the scientists need really fast processing,” Levy continued.
“Supercomputing really is the next big frontier and the one area where no level of performance is enough,” Rob Enderle, principal analyst at the Enderle Group, told TechNewsWorld.
Reshaping the Jaguar
Oak Ridge’s existing Jaguar supercomputer is a Cray XT5 system rated at 2.66 petaflops, which is 2.66 million billion floating point operations per second.
The Titan that will result from the Jaguar’s overhaul will be a Cray XK6 system with a peak speed of 10 to 20 petaflops, Oak Ridge stated.
In the first phase of the revamp, the Jaguar’s existing processors will be replaced with the latest AMD Opteron processors, code named “Interlagos.”
This will take the system from two six-core processors per node to one 16-core processor per node and facilitate the addition of GPUs, Oak Ridge said. The upgrade will deliver one-third more cores in the same physical space, double the memory and replace the existing connections.
“It’s more than the cores and the memory,” Oak Ridge’s Levy said. “Before, the Jaguar used Cray’s SeaStar interconnect, which would crash the system every time information hit a node that’s too busy or hit a broken link, and the system would have to be rebooted.
“Now, with Gemini, the system will work around the busy node so it’s more robust,” Levy added.
Then, 960 Tesla M2090 GPUs based on Nvidia’s Fermi architecture will be added.
In the second phase, which is scheduled for 2012, Oak Ridge will add up to 18,000 more Tesla GPUs based on Fermi’s successor, code-named “Kepler.”
You can get more information on the Titan here.
Snags, Snafus and Scuttlebutt
Work on upgrading Oak Ridge’s supercomputer might run into some problems.
Shipment of the AMD “Interlagos” processors to be used in the upgrade has been delayed, Cray stated. The effect of this delay is not yet clear.
Also, there have been reports that Nvidia’s “Kepler” architecture, which is the successor to “Fermi,” has been pushed back from this year to 2012.
“Kepler products for Tesla servers are on schedule as planned for 2012,” Sumit Gupta, manager of Tesla business at Nvidia, told TechNewsWorld. “We have not yet disclosed when in 2012 these products will ship, however.”
Fighting to Be First
Currently, the fastest supercomputer is Japan’s Fujitsu K computer, which is claimed to be more powerful than the combined strength of the next five supercomputers below it on the list of the world’s top 10.
The Fujitsu K is claimed to be three times faster than China’s Tianhe 1A, which took the top spot in 2010, beating out the Jaguar. The Tianhe 1A is also powered by Nvidia processors.
Three of the five fastest supercomputers in the world are located in Asia, according to a list of the world’s top 500 supercomputers. Two are in Japan — the Fujitsu K and the Tsubame 2.0 — and the other two are China’s Tianhe 1A and Nebulae-Dawning.
Oak Ridge’s Jaguar is third on the list after the Tianhe 1A.
Are nations in a race to top each other in computer power? Is there any practical use to making ever-faster supercomputers?
“The real race is for better science, which leads to innovation and competitive advantage,” Nvidia’s Gupta remarked. “Supercomputers are a tool in this race for faster, more complex simulations that lead to more discoveries and better understanding of physical phenomena.”