#What got us on Azure

Teonos is a couple of months old bootstrapping company from Finland. Being a young company that works with Microsoft tech, we applied and got into Microsoft’s BizSpark-program. BizSpark is great for startups as it offers free licenses for development tools and other Microsoft products.

As a part of the BizSpark-deal we get credits to use Azure. Obviously, we thought that we should put those into use and spun up a couple of VMs (running linux, which might’ve not been entirely according to Microsoft’s plan, but the option was there so…).

#How it went

##The good The first server we set up was for running a bleeding-edge version of our upcoming product for quick testing outside of our office network. After checking out the basics about the credits offered by BizSpark etc. I went ahead, set up the account and created the first VM. First impression was great: Cool UI with nice features like automatic performance alerts based on configurable rules. I went ahead, filled in that I wanted a standard size machine with Ubuntu, waited for a while for the fancy spinner to stop spinning on my screen and BOOM… Nothing? Where’s that server of mine?

Apparently something went wrong as that was the last that was ever heard of that server. Well, I repeated the process and finally got a server to my management screen. Cool, I had a server running and a shell open to do my thing. I deployed the first microservice quickly and after all, I hadn’t spent more than a bit over an hour since starting to write that service (yes, node.js and microservices being all the rage now, we had to try it out as well). Pretty good so far.

##The bad Later on I was wondering if it would make sense to move our production servers to Azure as well from our current hosting provider, so I dug a bit more into the documentation about the SLA stuff. It was an interesting read.

First of all, it was pretty hard to find a service level number for a single server (which in the end I found to be 99,5%). Secondly, if you look into it a bit, you see that separate pieces of infra have separate SLAs (CPU, IO, network etc.) so once you stack those together, you actually get less than the said 99,5%.

I pretty quickly found some articles about how to set up availability zones and machines in there. When you use availability zones Azure promises you a better SLA (99,9%), great! Or not. You have to

  • Pay for more servers
  • Pay for setting up failovers and clustering for the setup
  • Pay for maintenance of those servers and cluster setups

Big wonder why Microsoft is ready to give you a small boost in SLA (for your cluster) if you’re paying good money for it and maintaining it for yourself. Of course, you have to worry about failovers, clustering and maintaining your servers anyway if you have a serious production setup going, but If you just wan’t to have the best you can get with one server, you don’t want to hear about that stuff.

##The Ugly Nevermind the SLA and availability, the server we had running on Azure was not critical in any way so we didn’t really care about that. We kept our server running there, until one morning we noticed our monitoring system reporting an outage on that server during the night, the service had been down and the monitoring agent hadn’t responded for a while.

It took only a quick look at the server uptime to notice that our linux server had been rebooted. What?! Why would you go and reboot our server? Why wouldn’t you at least give us a heads up beforehand? Not cool.

The last event that triggered this post was that on Friday, we actually got a notification e-mail of some upcoming maintenance. Great! They were giving a heads up this time! But the smile faded while reading the mail. It was a notification that the whole cycle of updates would take 12 hours and each server would be down between 30 to 45 minutes.

##Conclusion We are confident that we will not be running anything critical on Azure, nor can we recommend that for anyone else. The mindset cannot be that you have to pay extra and setup everything so that any of your machines can be rebooted at any time, without any notice to you. Even when you are using cloud hosting for your services, you should be in control of the maintenance breaks.