Toronto-based hosting firm Q9 Networks Inc. may have to financially compensate some of its customers due to a router error last week that resulted in downtime.
A software error on one of the company’s routers occurred Wednesday
morning around 9:00 a.m., causing a cascading effect across the network. CEO Osama Arafat said that all of Q9’s data centres – the company operates centres in Toronto, Calgary and Brampton, Ont. – were affected by the problem which led to about an hour of downtime.
“There was a lot of details and it was a complex issue. There was basically various stages of the event that affected different customers,” said Arafat. “Routers, by definition, talk to each other, so that type of a problem is insidious.”
Arafat wouldn’t identify the company that made the faulty router software, nor would he say how many customers were affected, but some of them may be due for compensation.
Q9 provides service level agreements that guarantee 100 per cent uptime. “If we don’t achieve 100 per cent . . . that has financial penalties associated with it,” he said.
A Q9 customer, who asked not to be identified for the purposes of this story, said he realized there was a problem at about 9:20 a.m. “We noticed that we couldn’t connect to any of Q9’s internal sites (or) their control panel application. Their (main) site was down. There are a couple of other customers that we know that are hosted by Q9 and their sites seemed to be down as well,” he said.
The customer, who operates an e-business Web site, added that its unusual that a downtime event would be triggered during regular business hours. “I expect something like this to happen at 2:30 in the morning.”
Another customer, Toronto law firm Stikeman Elliott LLP, avoided downtime altogether by switching to a back-up provider. “We run a BGP router (Border Gateway Protocol) here for a situation like this,” said Venky Srinivasan, director of technology. A secondary backbone was able to take over, allowing the firm’s network to stay active.
Despite Q9’s 100 per cent uptime guarantee, Srinivasan felt it necessary to provide an extra fail-safe.
“When we sign a contract like this, we look at: have they checked all types of failures which could happen? The answer is yes. But there could always be a permutation or combination that you haven’t looked at. You can’t plan for all disasters, you do the best you can,” he said.
“It’s a piece of electronic equipment. Anything can happen. When we go to sleep, we don’t know if we’re going to wake up tomorrow.”
In 2003, Srinivasan told ITBusiness.ca that he was considering using a third party in addition to Q9. At the time, he was concerned that there are events beyond his control like the massive blackout that struck parts of Canada and the U.S. in August.
Arafat acknowledged that even the best-laid plans are susceptible to flaws.
“While you design for a certain thing, sometimes there are things beyond your control and this is one issue where it was beyond our control. I think customers are very understanding of that,” he said.
He compared the situation to an airline administering safety measures to its aircraft: “Crashes can still occur — it’s a fact of life,” he said. “You can learn something from this crash, but it doesn’t mean you can prevent all crashes in the future.”
Arafat said the company was in touch with affected customers within the hour. “I think customers mostly focus on how you deal with the issue and how quickly you deal with the issue. We’ve done well on both counts.”
The unidentified e-commerce customer said that his confidence in Q9 is “a bit” shaken, “but they are still the big boys on the block when it comes to hosting in Canada. Compared to the other guys in town, I would still choose Q9. But if you start including (hosters) in the States, Q9 isn’t the big fish in the pond anymore.”
Arafat said that Q9 currently has no plans to abandon its 100 per cent uptime policy.