Software Review

Rethinking BlackBerry Dependency Post-Outage

<em>Arik Johnson</em>

Following last week's outage of RIM's BlackBerry service, many in business and government alike (that's Karl Rove and Company at right) are starting to wonder if their dependency on the device with such a soft underbelly (according to competitors) is such a good idea.

"The NOC-centric nature of RIM's solution has always been a target of its competitors in the past, but usually the competitive message has been FUD [fear, uncertainty, and doubt] around security ('Do you REALLY want your data flowing through someone else's system?')," explained Avi Greengart, principal analyst for mobile devices at Current Analysis. "NOC service outages are a huge gift to competitors' marketing departments, because service outages are real -- end users feel those."

Russell Shaw on the BlackBerry Beat gave a failing grade to carriers as well:

There were some bright points, but all in all I was not pleased with the information flow and presentation.

First, let us discuss how each major U.S.-based BlackBerry carrier conveyed this data when I called them.

T-Mobile and SprintNextel did best. As soon as I called the T-Mobile trouble line, I was greeted with a newly updated recording. Same when I called "611" from my Nextel BlackBerry.

Verizon Wireless' performance was less than stellar. That's because I was routed around in their voice mail jail. I might have obtained outage info sooner if I was a Verizon subscriber, but that is not my point. The direct tech number from the outside is not indicated on the Verizon site. What if it was the voice service that was down, and I needed to call Verizon tech support from a landline or another cell?

Cingular-AT&T's was even worse. I called tech support and the lady who answered hadn't even heard of the outage. The outage was already more than 10 hours old. I asked her to search for trouble tickets- and lo and behold- there were buckets of them.

I'm stubborn enough to think that when you have a service emergency that affects millions, you arm just about all your customer service reps with this info? Desktop advisory to each workstation would have helped.

But you know what frustrated me just as equally? Lack of outage info on any of the aforementioned carrier's websites. It's funny- when you call carrier customer support, or even tech support and get placed on hold, the message invites you to check out the carrier site. But when you get to the carrier site, fast-breaking info is either not available or is buried several clicks deep. And try to find a relevant tech support phone number. If you're lucky it will be on the Contact Us page. But why not put up that contact info within an outage report bulletin on your home page?

And don't even get me started about BlackBerry tech support. Yes, there was a recording and even a few email updates, but nothing on the site about the status of the update and when things were projected to be back to normal.

Competitors pounced - "...lacking an explanation of what went wrong, customers mull smartphone diversification..."

Research In Motion bruised its reputation when a system glitch last week knocked out e-mail service to thousands of customers.
The service interruption, which began on April 17 and carried over to the next morning, lasted about 10 hours, but its long-term effect could be troubling for the BlackBerry manufacturer.

"This was probably the straw that broke the camel's back in regards to diversification for us," says Charles De Sanno, executive director of enterprise infrastructure engineering for the U.S. Department of Veterans Affairs, which has approximately 5,800 BlackBerry users.

Customers expressed concern not just that RIM's network crashed but with the vendor's slow response in explaining what happened. For nearly two days following the mishap, RIM's public response consisted of a 56-word statement saying merely that the cause was "under review."

"I'm personally disappointed in the way RIM handled this situation," says De Sanno, who's testing Windows Mobile devices from several manufacturers and plans to switch over a significant number of BlackBerry users by year's end.

RIM needs to show that it's making appropriate changes to avoid a repeat performance, says Carmi Levy, an analyst with Info-Tech Research Group. "This wasn't just a slowdown," Levy adds. "This affected the majority of RIM's global user base."

Lacking information on what went wrong, customers were left to assess whether their own servers were part of the problem. "BlackBerry is essentially an outsourced environment," says Paul Hinsberg, senior server engineer for Alameda County, Calif. "Whenever there's an outage, this determination has to go on whether it's us or them."

RIM maintains three network operations centers for its wireless e-mail system: two in Waterloo, Ontario, (one for North America, another supporting customers in Asia and the Pacific) and one in the United Kingdom for Europe, the Middle East, and Africa. Last week's outage, the company's first significant shutdown since June 2005, almost certainly originated in the data center serving North America.

On April 19, RIM issued a statement saying the outage was caused by "the introduction of a new, non-critical system routine" designed to optimize the system's cache. The company's failover process didn't perform up to expectations, RIM acknowledged.

TOO MUCH OF A GOOD THING

RIM has enjoyed explosive growth, with its subscriber base topping 8 million customers in the first quarter of this year. Customers last week wondered if the company has made the necessary infrastructure investments to keep up. In its statement, RIM said it has "definitively ruled out" security or capacity issues as a root cause of the outage.

Yet one competitor points to a potential weak spot in RIM's network architecture. Because all of its North American e-mail and data traffic is routed through one network operations center, RIM has what amounts to a single point of failure, says Fabrizio Capobianco, CEO of open source mobile e-mail provider Funambol.

Until now, the BlackBerry's cachet, reliability, and strong user experience have helped RIM fend off less expensive mobile e-mail alternatives based on more open operating systems. Some of that magic, however, evaporated last Tuesday night.

De Sanno calculates the VA can save more than $1 million a year by shifting to Windows Mobile, while "diversifying and securing the environment" in the process. That's a compelling proposition, and one RIM co-CEO Jim Balsillie must counter in the weeks ahead.

So while competitors capitalize on RIM's soft underbelly, apparently a software upgrade was to blame for the outage:

According to a statement from the Waterloo, Ontario-based company, the shutdown on April 17 was related to a software upgrade that went awry, followed by a failover process that also didn't work properly.

The BlackBerry blackout happened when the company introduced a new, noncritical system routine into its database, officials said. The routine, according to RIM, was designed to improve cache optimization but instead caused a series of interaction errors between the databases and the cache.

"After isolating the resulting database problem and unsuccessfully attempting to correct it, RIM began its failover process to a backup system," company officials said in a statement. Officials said that the company had repeatedly tested the failover process successfully, but this time something went wrong.

"The failover process did not fully perform to RIM's expectations in this situation and therefore caused further delay in restoring service and processing the resulting message queue," officials said in the statement.

The company's statement goes on to say that its analysis continues and that it has identified certain aspects of its testing, monitoring and recovery process that need to be fixed to prevent this from happening again. "RIM apologizes to customers for inconvenience resulting from the service interruption," company officials said in the statement.

Hopefully, you aren't quite as addicted as some:

An emptiness shook John Kleinschmidt's world at 8:15 p.m. EDT Tuesday. His BlackBerry, usually buzzing with dozens of email messages an hour, was silent.

"It felt like a tremor," said Mr. Kleinschmidt, an engineer at a software-development company in Troy, Mich., who gets an average of 250 emails on his BlackBerry each day. He is so devoted to responding immediately that he recently tapped away on the gadget's keyboard during his wife's stepfather's funeral; during showers, he keeps it within view but dry.

Panicked, he pulled out the battery several times, trying to reset the device. No dice. He stayed up the entire night, calling the BlackBerry help line every hour. No luck. Then, he logged on to his PC and began reaching out to other users to see what was going on.

He found a world of bereft people suffering the same kind of separation anxiety because of a North American outage of Research In Motion Ltd.'s BlackBerry.

The blackout hit millions of users of the popular wireless email device for at least nine hours Tuesday night and Wednesday morning. Even White House spokesman Tony Fratto expressed frustration, joking with reporters that the White House had started a "twelve-step group" to cope with the withdrawal.

While dealing with the unprecedented failure of its service, RIM said yesterday morning that service for most customers was restored overnight, adding the "root cause is currently under review." The source of the problem remains largely unknown and RIM hasn't provided a detailed explanation, even to clients. A spokeswoman for the company said it's unclear when one will come.

RIM, based in Waterloo, Ontario, has experienced outages before, typically lasting a few hours or less. But this massive failure hit the company as its user base added one million new accounts in its most recent quarter. The number of BlackBerry users is still relatively small, with eight million customers world-wide, compared with more than one billion cellphone users.

But those who use the device -- dubbed the CrackBerry -- are often devotees who use it around the clock and include senators, investment bankers, Hollywood and media types, technology workers and lately soccer moms and other consumers -- many of whom were just as upset as Mr. Kleinschmidt to see that emails weren't trickling in.

An online poll of 70 large companies by expense-management service ProfitLine found that 81% of respondents had some disruption to operations, with 44.5% reporting "moderate or substantial" impact on productivity.

"It's symptomatic of the increasing vulnerability of our economy," adds Edwin L. McClendon, executive vice president of the investment-banking division at Terra Nova Financial LLC in Chicago. He received more than 30 emails hours after they were sent and had to reschedule two conference calls. "We now have network-centric work habits."

Analysts say that epicenter of the problem was almost certainly related to the network operations center, the "post office" that receives emails from email servers and pushes them out to subscribers' BlackBerrys. Unlike competitors, all emails sent through the BlackBerry service go through the center, which switches the data over to the carriers' network and has multiple geographic locations. Such closed architecture, long touted as the secret behind the speed and security of BlackBerry's email service, make it vulnerable to systemwide outages, analysts and competitors say.

Tuesday night, the West Coast was wrapping up its workday when the outage began. Lori Sale, a senior agent at the Los Angeles-based talent agency International Creative Management, was at her 14-year-old son's baseball game when her BlackBerry stopped working. She first realized something was wrong at about 5:15 p.m. PDT, when she noticed she had received no emails on her BlackBerry since 5:03 p.m. Ms. Sale, who estimates she receives more than 500 emails a day, became alarmed when her boss then called and asked why she hadn't responded to his email sent four minutes earlier about a sudden problem.

She had to leave the game and make the 20-minute drive to the office Ms. Sale ended up working on her office computer until 9 p.m. Yesterday morning when she checked her BlackBerry, she was hit by a deluge of 14 hours of emails. "The only good thing was that everyone was going through the hassle together," says Ms. Sale. "If it's going to happen, it should happen to everyone -- and not just mine."

Wireless carriers that offer BlackBerry service started getting complaints from subscribers soon after problems occurred. They quickly realized the problem was RIM's and not theirs -- cellphone and text-messaging services on BlackBerrys were still working, after all -- but customers often blame carriers for any problems with their wireless handsets, so they swung into action.

The Sprint Nextel Corp. network team worked throughout the night to monitor the situation, exchanging a flurry of calls with RIM, a spokeswoman said. At Verizon Wireless, a joint venture between Verizon Communications Inc. and Vodafone Group PLC, the network operations team blasted an alert at 8:49 p.m. EDT via text message, email and voice-mail to top executives.

Email service began working for some earlier than for others. After Lucas Evans, an associate with Apollo Real Estate Advisors in Los Angeles, noticed he hadn't received responses to several urgent emails about an impending deal, he started some self-troubleshooting, sending emails to himself from a personal account, turning his AT&T Inc. BlackBerry on and off and, the time-honored solution to all technical problems, trying to shake it back to health. But the 26-six-year-old Mr. Evans, who had gone to bed with business pending, was rudely awakened at 1 a.m. when service was restored and his BlackBerry, which he had forgotten to turn off, started vibrating loudly on his night table. "I was up for hours answering all those emails," he said.

AT&T's wireless unit, formerly Cingular Wireless and the nation's largest carrier with 61 million subscribers, said users were back online as of 6 a.m., but there was a large backlog of message traffic from overnight. AT&T users who were traveling overseas and roaming on other networks were also affected.

Yet problems continued throughout the morning. At 8:10 a.m. EDT, the U.S. Senate's sergeant at arms sent an email to members notifying them of the RIM network issue. Senate aides preparing for an early vote on Medicare legislation -- emailing each other to gauge support for the measure -- were frustrated that their messages weren't going through. Only at 10:43 a.m. did the sergeant at arms say service had been restored.

To some, the BlackBerry outage was actually "a welcomed respite from the constant infernal buzzing of my blue plastic sidearm," Michael Brawer, a Los Angeles talent agent, wrote in an email. The only downside, Mr. Brawer said, was that he couldn't email his mom to brag about having dinner next to actress Lindsay Lohan. "Cheers to missed opportunities," he said.

Login

Username:

Password:

Poll

Which CI Software Vendor does your Organization Use?
Acuity
0%
Cipher
2%
Coemergence
9%
Comintell
15%
Digimind
3%
Novintel
18%
QL2
0%
Strategy
5%
Traction
0%
Wincite
7%
OTHER
10%
MULTIPLE
1%
NONE
31%
Total votes: 105

Online

0 guests and 0 users online: