CERN Battling Severe Case of Data Indigestion
As a miles-wide research facility recording 40 million sub-atomic events per second, CERN's Large Hadron Collider must deal with massive amounts of data, and it's still struggling with system failures. The need for many disparate systems to communicate with each other creates a high degree of complexity, "and because it's complicated, it fails," said Tony Cass, leader of CERN's database services group.
Tony Cass, the leader of the European Organization for Nuclear Research's (CERN's) database services group, outlined some of the challenges the organization's computer system faces during his keynote speech Wednesday at LISA, the 24th Large Installation System Administration Conference, being held in San Jose, Calif., through Friday.
Smashing beams of protons and ions together at high speeds in CERN's Large Hadron Collider generates staggering amounts of data that requires a sophisticated computer system to handle.
The CERN computing system has to winnow out a few hundred good events from the 40 million events generated every second by the particle collisions, store the data and analyze it, manage and control the high-energy beams used and send and receive gigabytes of data every day.
Numbers, Numbers, Numbers
The accelerator generates 40 million particle collisions, or events, every second. CERN's computers pick out a "few hundred" per second of these that are good, then begins processing the data, Cass said.
These good events are recorded on disks and magnetic tapes at 100 to 150 Mbps (megabits per second). That comes up to 15 petabytes of data a year for all four CERN detectors -- Alice, Atlas, CMS and LHCb. The data is transferred at 2 Gbps (gigabits per second) and CERN requires three full Oracle SL8500 tape robots a year.
CERN forecasts it will store 23 to 25 petabytes of data per year, which is 100 million to 120 million files. That requires 20,000 to 25,000 1-terabyte tapes a year. The archives will need to store 0.1 exabytes, or 1 billion files, in 2015.
"IBM and StorageTek and Oracle have good roadmaps for their tape technology, but still managing the tapes and data is a problem," Cass said. "We have to reread all past data between runs. That's 60 petabytes in four months at 6 Gbps."
A "run" refers to when the accelerator is put into action. StorageTek is now part of Oracle, whose databases CERN uses.
CERN has to run 75 drives flat out at a sustained 80 Mbps just to handle controlled access, Cass said.
Dealing With the Data
CERN uses three Oracle accelerator database applications.
One's a short-term settings and control configuration that retains data for about a week. "As you ramp up the energy (for the beams) you need to know how it should behave and to have control systems to see how it's behaving and, if there's a problem, where does it come from," Cass explained.
The second is a real-time measurement log database that retains data for a week.
The third is a long-term archive of logs that retains data for about 20 years. There are 2 trillion records in the archives, which are growing by 4 billion records a day. Managing that is complicated. "They want to do searches across the full 2 trillion rows ever now and then," Cass remarked.
There are 98 PCs in all in CERN's control system, which consists of 150 federated Supervisory Control and Data Acquisition (SCADA) systems called "PVSS" from ETM, a company now owned by Siemens. The PCs monitor 934,000 parameters.
Overall, CERN has about 5,000 PCs, Cass stated.
CERN's processing power is distributed worldwide over a grid. "There are not many computing grids used on the scale of the LHC computing grid, which federates the EG, EGI and ARC science grids in Europe and the Open Science Grid in the United States," Cass said. "The Grid is enabling distributed computing resources to be brought together to run 1 million jobs a day. Grid usage is really good."
CERN has a Tier Zero center, 11 Tier One centers at different labs, and 150 Tier Two centers at various universities. Tier Zero performs data recording, initial data reconstruction and data redistribution, Cass said. Tier One is for permanent storage, reprocessing and analysis, while Tier Two is for simulation and end user analysis.
CERN has also developed a Google Earth-based monitoring system that runs about 11,400 jobs worldwide at a data transfer rate of 6.55 Gbps.
Problems, Problems, Problems
CERN's still struggling with system failures because of the complexity of its setup.
"There's a lot of complex technology, a lot of systems that need to interoperate to transfer data within CERN, the system to talk between different storage systems that have slightly different mindsets, all this is complicated, and because it's complicated, it fails," Cass pointed out.
For example, there are conflicts between file sizes, file placement policies and user access patterns.
"When people want to read data back they want all the data recorded at the same moment at the same time and have to mount several tapes to do that, so there's a conflict between the right access patterns and the right storage patterns," Cass pointed out.
Hardware failures are frequent and can cause problems for storage and database systems, Cass stated. "We have something like 200,000 disks across the grid and are getting disk failures every hour around the grid, and when they cause storage system problems, there are even more failures," he added.
Infrastructure failures are "a fact of life," Cass said. These usually consist of a loss of power and cooling systems going down.
The overall computing structure is also an issue. "We're trying to move from computer center empires to a federation with consensus rather than control," Cass remarked. "We do have consensus, but communication is a problem."
For example, the Quator and Lemon tools CERN's IT department developed for fabric management and monitoring haven't been adopted by all its Tier One and Two sites.
"A major financial institution with tens of thousands of boxes -- far more than we have -- has adopted them," Cass stated. "The idea that we'd be able to have a common configuration management across the system still hasn't worked out."
There are also problems with the shared file system. CERN's software has to be distributed to 150 sites around the world, and it uses AFS while "a few hundred" nodes use NFS, which creates a bottleneck.
AFS, or the Andrew File System, is a distributed networked file system which uses trusted servers to present a homogenous file name space to all connected workstations regardless of their location. NFS, the Network File System, is a protocol developed by Sun Microsystems. It lets client computers access files over a network as if those files were in local storage.
Solutions, Solutions, Solutions
CERN is developing a solution for the shared file system bottleneck. This is CERN VM, a virtual software installation with an HTTP file system based on Growfs using HTTP caches that will cache data on nodes.
"Over the next couple of years, CERN VM will be used more and more for software distribution and to resolve file system bottlenecks," Cass said.
To resolve the problems with storage placement and access, CERN is changing the way it manages data transfer.
Other improvements to storage include putting data being accessed for analysis into a separate system from data reported from experiments. The analysis data storage will have lower latency because "people don't like latency, they want immediate access to the file at the risk of a penalty later," Cass pointed out.