Wednesday, January 7, 2009

No, Chicken Little, the (SaaS) sky is not falling



Yesterday there was a Salesforce.com outage that lasted somewhere between 30-40 minutes. In the 24-hours since that time, the blogosphere has been filled with various pundits spreading doom and gloom, saying this outage proves Cloud Computing is not a prudent business decision.

Sorry, Chicken Little, the SaaS sky is not falling. Cloud computing is here to stay. If anything, yesterday's outage is proof to me how much better off we are running our business in the clouds.

When the Salesforce.com outage started, dozens of IT engineers jumped into action, working to first isolate the problem and correct it. In the aftermath, today and for the rest of the week, dozens of technical teams will be working to understand what went wrong, how it went wrong, and what they can do to prevent it from happening again.

The best part? None of those people are on my company's payroll. It's a good thing -- I couldn't afford that many IT and support staff. Yet I am very thankful that they're on my "cloud", helping me run my business. I wouldn't be able to administer such an incredible CRM tool without their help.

When the outage started, I didn't call Salesforce.com; I saw that several folks on Twitter had already done so. I sent out an internal email telling colleagues "we are aware of the problem with Salesforce.com, and have people working on it right now." Yep, I had people working on it!

I switched our Support team over to our Salesforce Offline Viewer. It's a custom PHP application accessing a SQL server database that we designed several months ago. Through the Salesforce API, we update the SQL database every 24-hours with a complete copy of all transactions we've entered into Salesforce.com. The GUI isn't as pretty as Salesforce, but it contains all the data they need: Cases, Solutions, Contacts, Accounts, and various custom objects.

I then set up a "Salesforce" Twitter search on my TweetDeck, sat back, and watched for news that people were able to finally log in. I've never had to spend less resources and manpower managing a crisis.

This 40-minute service outage was a blip. Let me tell you about another recent outage at my company, this one not related to the services we have running in the Cloud.

A few weeks ago, we had a little ice storm here on the east coast. It knocked out phone lines and electrical power for hundreds of thousands of New England residents. Our company lost power for 4-days. The backup generator lasted for about 10 hours, and then it went offline, too. Everything was down: telephone service, electrical power, heat.

Employees couldn't communicate with each other over Blackberry or email, because the Outlook exchange server was down. Customers couldn't access our corporate website. All of our critical IT systems are redundant, with a co-lo on a completely separate power grid. Even that didn't help us, as this storm had knocked out both power grids (yes, we're moving the co-lo further away -- and let me tell you that is a huge project in itself!).

Every member of our IT and support staff began round the clock shifts, working to get our back-up generators back online, contacting utility vendors and facility managers, shutting down servers and equipment to protect from power spikes, etc. It was exhausting!

The one aspect of our business that was unaffected? Salesforce.com. We were able to communicate outwardly to customers via Salesforce. That allowed us to send a Mass Email to all Customers, providing them with alternate contact numbers for accessing our Customer Support staff. We modified the home page layout, to include status updates for all employees logging in to Salesforce from home.

The sales team, 90% of whom are remote, were able to conduct business as usual. After we altered the call routing for our main support phone #, the entire customer support team was able to work from home, too. They had full access to Cases, Solutions, and other information they use in Salesforce -- as if they were in the office.

The four-day service outage was seamless to our Customers, thanks to the Cloud. Thanks to Salesforce.com.

I compare these two outages -- especially the manpower and resources that went into restoring service fully and I know that Cloud Computing is the right model for our CRM service.

Perhaps all these blogging pundits have Cracker Jack IT teams, allowing all their in-house systems to operate with 99.999% up-time. If so, I hope they give special thanks to each and every member of their IT staff, and maybe a nice fat bonus at the end of the year.

Me? I work in a place called reality. And even though my company has the very best of IT folks, they are resource constrained. They do capacity planning, integration testing, and constant tuning of our many complex business systems. But Salesforce's IT team is bigger, and comparing the availability numbers between our in-house and cloud-based systems just wouldn't be fair.

So cheer up, Chicken Little, you've got people.

11 comments:

  1. JP, great commentary on the situation. It helps put things in perspective. I think one of the greatest advantages of multi-tenancy is that if I'm experiencing an issue, it's quite likely that I'm not the only one. Do you think that Citibank, Cisco, Dell, and Japan Post have a few riders in their contract about responsiveness? Multi-tenancy is exactly why Salesforce has incentive to get it resolved quicker. I agree with you that "we" had people working on it and was one of the least stressful "system outages" I've been through.

    ReplyDelete
  2. JP -
    You raise a super point: Would a company who uses an on-site server and loses power for four days in an ice storm start complaining that they'll never trust the electric company again? I doubt it.
    Salesforce was down 40 minutes. Under their 99.99% guarantee, they can be down 40 minutes every 277 days... and I believe that they achieve that.
    If we compare the gained productivity from using Salesforce to the lost productivity in the small downtime, one clearly outweighs the other.
    Super post!

    ReplyDelete
  3. JP,

    Interesting take. Certainly stands out among the doom and glood that many have been reporting.

    On the other hand, consider that 170 million transactions were unable to be processed. In the hypothetical situation that the entire world switched over to Salesforce.com, the global economy would likely come to a screeching halt if another such outage occurred and billions or trillions of transactions were stalled or voided.

    Especially in the case of the financial services industry, where security and reliability are crucial (particularly for the larger firms that process massive numbers of transactions daily that equate to billions of dollars) this type of outage may be enough to scare them away from cloud CRM.

    ReplyDelete
  4. Taking the 170 million transactions logic to its next step, let's assume that everyone in the world is on the Force.com platform. All productivity would skyrocket. Integrations and data exchange would be far easier (and would break far less frequently).
    The bottom line is this: The outage was much shorter than any other system would have experienced. Salesforce is proven to increase productivity and profits, and Salesforce's "security and reliability" are stricter than most on-site servers anyway.
    I'm actually surprised to see the financial services industry mentioned, considering how many of the world's top financial services companies use the Force.com platform specifically because its security and reliability are superior to all other options they considered when choosing a database platform.

    ReplyDelete
  5. JP your post and your attitude rocks. As a SaaS evangelist I agree with your assessments and conclusions. Thanks for sharing your thoughts.

    Your foresight to build the Salesforce offline viewer is a lesson worth adopting.

    ReplyDelete
  6. Somebody help me out here. Where are people getting this 170 million transactions that couldn't be processed story? On the day of the outage (Jan 6), Salesforce successfully processed 177,431,333 transactions. (See http://trust.salesforce.com/trust/status/ for details)

    Because of the outage, did someone not enter a new Account into Salesforce this week? Did they just say, "The system wasn't up when I tried, so I decided I'd just never put in that Opportunity." Of course not. They came back 38 or 60 or 120 minutes later, and they did their work. Salesforce did not prevent 170 million transactions from being done. If they did, wouldn't we see a typical day on the Trust site showing 340+ million transactions?

    This outage delayed people's ability to enter transactions when they planned to. It did not stop them from ever entering them.

    If I'm missing something here, somebody please enlighten me.

    ReplyDelete
  7. The 170k transactions is an interesting statement. Upon further digging it seems the poster is a biz dev guy for a client/server crm that focuses on the financial services space...totally unbiased comment?

    Plain fact is that client/server products crash all the time, and the bugs surrounding the need to manage all the thousands of possible deployment environments lead the queues in support calls. Forget what happens when I get a new disk and have to check and see what of my necessary customizations need to be rebuilt.
    At least on Tuesday I wasn't on the phone with a customer service rep telling me "Well, works on my end...must be your environment..."

    I like having people to figure that out for me....

    ReplyDelete
  8. Agreed on the commentary..40 minutes is not a lifetime...unless you are completely tied to that system as your corporate lifeline. I can remember VISA going down for a full day...talk about financial impacts cascading downhill....where were you August 14th, 2003?

    The biggest issue is always one of "control". Well to quote a line from "Days of Thunder" where the race cars speed away at a rate 1/1000th that of the salesforce.com universe...."control is an illusion you infintile egomaniac"....the best IT companies in the world with the best colocation, redundant backups and most dedicated individuals can't think of everything....and certainly not for the monthly price of fixing this SFDC outage (no increase in my monthly fees). I also rather like the end result from a management perspective, budget management that is.

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. JP, some people have not realised that having 51.000+ companies asking to solve an issue is more efficient than call the insource IT guys to setup an e-mail account.

    ReplyDelete
  11. Heya¡­my very first comment on your site. ,I have been reading your blog for a while and thought I would completely pop in and drop a friendly note. . It is great stuff indeed. I also wanted to ask..is there a way to subscribe to your site via email?
    SAAS Service

    ReplyDelete