The development of Canada’s largest grid computing network is not only contributing to a worldwide particle physics project – it’s teaching high-performance users how to share their resources.
In production since the fall of last year, GridX1 consists of eight clusters operating out of a series of post-secondary schools, including the University of Victoria, the Centre for Subatomic Research at the University of Alberta, the WestGrid cluster at the University of British Columbia, and the Research Computing Support Group at the National Research Centre in Ottawa. These facilities are pooling their computer cycles to share data and applications through a technique known as grid computing. The project has been underway for several years.
Running on open source software using Intel-based servers, Grid X1 is linked to an experiment running out of the Conseil European pour la Researche Nucleaire (CERN) in Geneva. The experiment, called ATLAS, will simulate the way protons collide in order to learn more about matter and prove the existence of a particle, the Higgs boson, that scientists hope will explain how the universe was formed.
Randall Sobie, a University of Victoria professor and GridX1 research scientist, said that when ATLAS begins to record data in 2007 or 2008 it is expected to generate a petabyte or two of data per year.
“The challenge of analyzing all that data made us realize we couldn’t do it in one site. The idea was to exploit all these facilities that are distributed around the world,” he said. “You have the resources, you have the manpower, why not build a facility to do this? The grid seemed to make sense.”
There is no individual funding for GridX1, nor is there an over-arching body that controls the way the grid operates. Although the grid contains about 3,000 CPUs, Sobie said only about 10 to 15 per cent of the resources tend to be used.
Sobie compared GridX1 to the Seti@home project, which allows home users to pool their PC resources during idle periods as part of an experiment to search for extra-terrestrial life.
“In a way it’s like that kind of concept but in a bigger scale. Instead of a desktop resource it’s a network resource,” he said.
“It’s a very Canadian kind of system, in that we’re all sort of trying to get along,” he said. “Only certain applications really lend themselves to this kind of environment, because of the way the thing is structured. You have to live with this sort of situation where you don’t really know how many resources you’re going to get at any given point. You certainly can’t turn around and demand more.”
The spirit of cooperation is helping others understand how to run distributed high-performance computing environments more effectively, according to other GridX1 participants.
“We’re looking at it as an example of an active grid and learning from it,” said Rob Simmonds, research director at the Grid Research Centre at the University of Calgary.
“In Canada, certainly it’s the biggest, most wide-spread production grid environment.”
Sobie said keeping the infrastructure side simple has helped make GridX1 more feasible.
“It’s not like a large HPC or parallel machines that you buy. They’re built up from these PC-like components,” he said. “It’s well-suited for the grid applications. It’s easy to duplicate.”
Sobie and his colleagues are working with researchers from the WestGrid supercomputing facility to connect its cluster at the University of British Columbia to GridX1 and ATLAS.