RIM services ‘fully restored’ though cause for failure unknown, CEO says

Update: BlackBerry service is fully restored for 70 million customers worldwide after three days of outages, Research in Motion announced today.

But officials still don’t have a definitive cause for the failure of a core switch that led to the disruption nor did they mention plans for reparations to customers.

“We’ve now restored full service,” said co-CEO Mike Lazaridis in a Webcast held shortly after 10 a.m. ET.

Lazaridis and co-CEO Jim Balsillie didn’t say how reparations for customers will be handled. RIM has service level agreements hammered out with thousands of business customers using BlackBerry Enterprise Server that govern outages, but consumers with the BlackBerry Internet Service are not as well protected.

Both Lazaridis and Balsillie repeated that RIM had experienced 99.97 per cent uptime on its global network for the 18 months prior to Monday’s outage, which began primarily in Europe and Asia. “We’re taking aggressive steps to minimize this happening again,” Balsillie said.

At the opening of the news conference, Lazaridis went slightly further than what RIM CIO Robin Bienfait said late Wednesday in a statement about the level of improvements in the outages. Bienfait had said email and BlackBerry Messenger traffic and Web browsing were “operating,” but Lazarides said services were “restored full[y]” even though some customers might need to re-sync their devices by pulling out the battery and re-starting them. He said some email backlogs might continue.

Lazaridis also appeared in a videoposted to the company’s Web site and YouTube, apologizing for the outage and promising the firm is working “tirelessly” and “around the clock” to resolve the issue, though he gave not estimate to when service would be fully restored.

RIM founder and co-CEO Mike Lazaridis apologizes for BlackBerry service outage.

Browsing was however temporarily unavailable in EMEIA as the company’s support teams monitor service stability and continue to assess when the service can be safely brought online. Browsing was however available in the U.S., Canada, and Latin America except for customers serviced by three carrier networks in Latin America that use infrastructure in EMEIA, Bienfait said.

On Twitter and other social networks, customers started reporting late Wednesday the restoration of service on their BlackBerry phones. “The service is still a bit slow, but I think it would be ok once the backlog is sorted out,” said an user in Gauteng in South Africa via email, who requested not to be named.

In a later update on its website on Wednesday, RIM said its was seeing a significant increase in service levels in EMEIA. “Service levels are also progressing well in the U.S., Canada and Latin America and we are seeing increased traffic throughput on most services, although there are still some delays and services levels may still vary amongst customers”, it added.

The service interruptions began on Monday, and initially affected customers in the EMEIA region, followed by Latin America. By Wednesday, users in North America also started complaining of a disruption in service.

RIM said on Tuesday that messaging and browsing delays experienced by BlackBerry users in Europe, the Middle East, Africa, India, Brazil, Chile and Argentina were caused by a core switch failure within RIM’s infrastructure.

“Although the system is designed to failover to a back-up switch, the failover did not function as previously tested. As a result, a large backlog of data was generated and we are now working to clear that backlog and restore normal service as quickly as possible,” RIM said in a statement.

The new crisis comes as RIM is fighting off agitated investors that are asking the company to explore strategic options and a new leadership. The Canadian company is also fending away competition from Apple’s iPhone and phones running the Android operating system.

If RIM wants to regain the trust of operators and users, the company now needs to provide more details on why the back-up switch didn’t work as expected; why the problems spread to North America; and most importantly what it is doing to ensure that this never happens again, said Francisco Jeronimo, research manager at IDC.

“You’ve depended on us for reliable, real-time communications, and right now we’re letting you down,” Bienfait said in the letter. “We are taking this very seriously and have people around the world working around the clock to address this situation.”

In a conference call earlier on Wednesday, David Yach, RIM’s chief technology officer for software said that there was no evidence of a breach or a hack. He did not provide information on how many customers were affected, saying that different customers were affected differently, with some not at all affected.

The backlog and queuing of messages starting in Europe caused some impact in nodes in other geographies, Yach said.

John Ribeiro covers outsourcing and general technology breaking news from India for The IDG News Service.

Share on LinkedIn Share with Google+