How BlackBerry’s outage could have been prevented
This past Monday millions of BlackBerry users were hindered with email downtime for three hours due to the company’s second major outage in less than a year. Customers immediately jumped into online discussions and on blogs like here and here to find the root of the problem. The outage is believed to have been caused by the failure of one of two Internet addresses that relay e-mail from corporate servers. Jack Gold, a technology analyst from J.Gold Associates stated something we are very familiar with:
“Any time you got a system that’s got a NOC, a Network Operations Center, you have the potential for a single point of failure.”
A great point Gold raises (where our expertise in high-availability comes into play) is that if a company isn’t able to have enough redundancy in the NOC, then why don’t they have a technology in place to make sure there isn’t a single point of failure?
There’s no way of knowing how much business was lost for BlackBerry or BlackBerry cell phone carriers during the downtime; however one thing is for sure, if RIM had implemented a fault tolerant solution, then “routine upgrades” would not create such a fiasco amongst customers.
Hopefully they recognized this mishap as a lessoned learned.





Leave a Reply