Oct 26 2009

Chef Shawn by day, Puppet master by night

After my first post about GoGrid and the lack of server images at the time I was kindly advised that real server admins use tools like puppet, cfengine and chef to handle this.  Now if I was a real server admin this would have been quite the blow to my ego, luckily however I’m not and just play one on the internet.  That said this advice did cause me to look into Puppet, Chef and cfengine and I liked what I saw.

I read through the project pages quickly and decided to start with Puppet for some reason.  So off I went setting everything up.  My first plan was to test with one “class” of server that I need to scale up and down often.  Creating my scripts was well somewhat of a challenge, a lot of a challenge actually.  I ended up buying a book on puppet which helped quite a bit and after a few days(Yes days!) I finally had everything working to load and update a server using puppet.  This wasn’t encouraging since I still didn’t feel like I really knew puppet well enough to go ahead and fly through the other server types.  In fact as sad as it sounds I was still pretty confused most of the time.

Puppet did run well though and interfaced nicely with our nagios monitoring so all was good but I was convinced I could do better.  Well not me exactly but rather the people that spend time building this great software and not better exactly but more suitable for my skills and mastery of point and click.

So along comes Chef.  Chef was EASY.  Really easy.  I felt like I flew through setting up my cookbooks and the interface made me very happy as I could show off not only my pointing and clicking skills to my colleagues but also my dragging and dropping capabilities.  Chef soon replaced puppet in our network and things were smooth sailing until the day.  I call it slow Thursday.

Slow Thursday started out normal enough.  Until about 3pm when suddenly the alerts started coming in “Server 1 slow response”, “Server 10 slow response”, etc.  All my servers under Chef were replying dreadfully slow or even not at all.  Now, not being a real admin I didn’t exactly put two and two together and get 4.  No, instead I spent until about 4 trying to figure it all out.  It was only when I decided to kill Chef on one of the servers(with the intention of restarting it) only to get an immediate speedup did I realize the culprit.  Chef was killing my servers.  I restarted the client on the server and it went slow again shortly after.  So I started looking at the server.  After much Googling and hmmm and thinking and restarting of all chef related servers I finally just restarted the whole server and started chef on it again.  Everything on the clients went back to normal.

That was weird I thought no big deal though until it happened again, and again.  Now Chef is still new software so I should and did expect some glitches and this was fine.  It just ment I couldn’t use it at the present time to manage my servers.  I still use it for the most important task the initial configuration.

Scorecard time:

Puppet

[CON] Hard for a newbie like me to configure and manage(I’ve heard good things from more experienced users though)

[PRO] Seemed solid while we ran it

[PRO] Easy to monitor with Nagios thanks to existing scripts

[PRO] Book exists!

Chef

[PRO] Cool interface

[PRO] Lots of easily found cookbooks

[PRO] Easy for a newbie to create cookbooks

[CON] Still has some stability issues

End result: Tie!

Seriously though they are both great pieces of fairly young software.  Many thanks to all the developers on both projects for taking the time to make my life a heck of a lot easier.  I look forward to them both maturing further over the years and getting to use them time and time again.


Oct 22 2009

Balancing with Zeus ZXTM

Recently I started to run into issues with my load balancing solution.  I was running HAProxy on a GoGrid instance and it was working pretty well.  Eventually though as our traffic went up problems started to appear.  After completing as many networking optimizations as I could it was clear that I needed to find another solution.  I was debating between HAProxy on a dedicated server at ServePath, a self managed hardware load balancer or a managed load balancer from ServePath when another contender entered the ring.

The nice people at Zeus set me up with a trial of their ZXTM load balancing solution to try so I figured it was worth a shot.  Setup was pretty easy well ok very easy on a fresh GoGrid instance, I made some networking configuration optimizations then went about setting up my pools.  To ensure it wouldn’t just completely die with our traffic I set it up as a server under our existing HAProxy setup then just gradually ramped up the percentage of requests going to it.  Once it was almost at 100% I made preparations and swapped it out with our HAProxy system.

There was immediately a noticeable improvement in latency which if you’ve read my blog before will know is very important to me.  The interface was a pleasure to work with as well enabling me to easily monitor traffic levels and issues on our operations screen.  After setup it was pretty much left to its own while I continued investigating the other solutions.

Performance wise it went very well.  During the testing period we peaked at I believe around 4000 requests per second and commonly ran at over 2000 per second for hours at a time.  While we had some slowdowns during this time it wasn’t anything dramatic and probably had more to do with running it on a virtual server than a problem with Zeus itself.

ZXTM also offers a pretty cool ability to move some application logic forward into the load balancer.  Though it wasn’t suitable for my needs I could certainly think of a lot of uses in other situations.

In the end my month with ZXTM was certainly a good experience and I can strongly recommend their software.  As for my setup, while I ended up going back to HAProxy only this time running it on dedicated hardware and it is still going strong.


Sep 7 2009

Long promised, finally delivered. The Cloud server update

So quite a while ago I released a review of a couple different cloud providers.  It’s been a few months so I thought I’d give an updated picture.  Like last time I still use Amazon, GoGrid and Rackspace cloud.  Also like last time Rackspace cloud is still my least used though unlike last time Amazon is a very close second.

During my last post I believe I mentioned how important latency was to me.  Specifically latency to a number of services on the west coast in the bay area.  GoGrid still is the best of the providers in this regard and due to it I’ve now moved almost all of my services over to GoGrid.  There is some backups on Rackspace cloud and some applications that are just too painful to move on Amazon still but in general 95% of my servers are now with GoGrid.

I’ll start with Amazon.  They still really are the ones setting the bar in this space and they haven’t slowed down.  I get constant new feature updates from them.  I still use scalr instead of their own console though I have gone in there to play around a few times and it seems pretty nice.  My only real complaint with Amazon is their location and latency to the west coast that I need to be talking to.  The prices on their reserved servers are also nice but I would prefer GoGrids monthly prepay option here as well.  Almost forgot support.  A few months back something went wrong at amazon causing a number of EBS volumes to fail.  Just as I was about to go on vacation no less.  It happened to take down our primary DB which isn’t fun when you’re only stopping into the office to waste time before a flight.  Trying to get any information out of their support was like pulling teeth and their outage page was rarely updated with no set pattern or frequency.  It was also wrong for half of the outage.  They didn’t even admit there was a problem for the first while.  I had to go to twitter to see a lot of other posts about it before being convinced it wasn’t just my systems breaking.  Thats the only time I’ve had to use support with Amazon which is a good sign but the support I received deserves a D-.

Rackspace cloud we use as an offsite backup so it doesn’t get much use but the interface is awesome from what little I’ve used it with all the important information easily found.  Nothing much really to complain about here but alternatively I haven’t used them enough to give me much specifically to praise either.   I’ll leave this one to someone else.

That leaves only GoGrid who took a lot of harsh criticism in my last update.  I’m now using them almost exclusively and a lot.  They still have some problems but I’ve found a lot of good since then so in the interest of saving the best for last I’ll start with the bad.

The first problem for me is their lack of different server images still.  Still only CentOS and RHEL for the linux side.  The ability to have debian or ubuntu based distros has been promised for a long time but no apparent update given.

This leads directly to my second biggest problem with GoGrid is their schedules for releasing features are consistently sliding back.  I work in development so I understand schedules slide but you can compensate for this by adding time to what the developers say.  If they always say next month and always ends up being 3 months just say 3 months and save the hassle of having to explain to me why it’s 2 months late(instead of early if they do actually get it done in a month).  This happened with MyGSI which was pushed back repeatedly and then released with a number of missing features and it’s happened with the release of new server images.

My last gripe is the interface.  It looks cool but has a number of bugs in firefox that make it difficult to work with.  The lack of scrollbars on the main page has ment that multiple times while working on my laptop away from my desk I’ve been unable to see all my servers in the list.  This also effects my ip lists with me unable to see all my ip blocks if the billing side box is open.  They are ment to resize apparently but this has never worked for me.  Maybe a minimalistic interface option?

Lastly is forms.  I understand forms are sometimes needed for things.  But every time I want another block of ip addresses I end up filling out the ip request form 2-3 times for different support people.  Same with the SMTP unblocking form which was resolved when I was able to better explain in an email to support what I needed mail for.

Ok thats it for complaints now onto the good things I’ve found with go grid.  First is support.  They have awesome support.  I work in Australia so I mostly hit their night crew and they are all very helpful and quick too.  Their upstream support is also pretty quick to fix problems at these times as well.

Servers launch fine and fast and once you get past the annoyance of requesting new ip blocks all the time they do actually provision them near immediately.

MyGSI has it’s bugs still but I’ve tried it this week and despite some confusion in the documentation about how I needed to configure the cloud storage first it seemed to work well.

The API comes in handy for a lot of what we do now having so many servers going.

Lastly is the ability to link up to servepath dedicated servers using a private vland.  We haven’t fully utilized this but will be doing so in the near future for their load balancing solutions.

My end result for GoGrid is that it still has a ways to go but has come a long way and the terrific support makes up for a lot of the short comings I’ve found.

As for recommendations.  If you need a west coast setup or you want to utilize their hybrid hosting go with GoGrid.  Otherwise AWS is still the way to go.

Let me know if you have any questions about AWS or GoGrid as I’ve certainly had my fair share of experience on both now and I’ll try to update this again Q1 2010.


May 3 2009

Finally got JIRA installed

So I launched a slicehost slice today and finally got around to installing the copy of JIRA I bought during their $5 sale a short time ago.  Installation went smoothly and I have JIRA up and running with a MySQL backend.

It looks pretty good and I started setting up my first project and getting myself use to the environment when I ran into my first real problem.

I couldnt get it to work with github though nor could I find a way to do it on Google that I could actually understand.  Unfortunately I just don’t have time right now to struggle trying to get it to work and I’m really not keen on switching to svn so for now I’m going to have to shelf JIRA for a bit longer.  At least until someone can give me a solid method of getting to it work with git.

In exchange though I went ahead and started up my lighthouse account again.  I’ll play around with that for a bit now.


Apr 29 2009

Amazon AWS Us-West coming soon?

I’ve spoke to Amazon a few times about a US-West region as so far the reply has always been that they are considering it but nothing is in motion yet.  Well the other day while making a request for a higher limit on my elastic ips I went to select the region I wanted them for and to my surprise saw US-West in the list.  Needless to say I was quite excited!

Unfortunately I couldn’t find the option to create new west coast instances and today the option is gone from the elastic ip page.  Hopefully though this is a sign of things to come very soon as I would love to be able to host all our servers with Amazon.


Apr 26 2009

Watching my back and front with Zabbix

Zabbix is a network monitoring system which uses a central server and remote agents which report back to it. Today I setup my central server and after much document reading, and Google searching got it all configured and working with one agent. Going forward now that I’m familiar with the system the rest of the agents should be a lot easier to setup. Next task is to configure the screens.

During the server setup I ran into a few problems that you might have as well so read on and hopefully you’ll save yourself a lot of head scratching.

1. Zabbix doesn’t support SMTP-Auth. This is certainly a pain but thanks to this Forum Post it’s not that hard to work around. I started developing a patch to add in SMTP-Auth support and I might finish it off this week but I needed the alerts up and running right away so thats on hold.

2. Hosts, Items, Triggers, Actions oh my!
- Hosts are the systems you want to monitor, these need to have zabbix_agent/zabbix_agentd installed and configured for the server. When adding hosts select a template to get a bunch of default items/triggers out of the box, these will save you a lot of time.
- Items are things to watch IE. the amount of free space on a drive.
- Triggers are conditions of items that cause an alert. For example the amount of free space on partition /usr drops below 100MB
When you select a host template a lot of these will be setup so you’ll probably want to start off by just cloning/editing existing ones.

Thats really about it to get started I have personally just scratched the surface so I can’t go much further but this should be enough to get you up and running fast and give you some extra time to study the docs.

Grab Zabbix Today.



Apr 23 2009

Weekly Book time High Performance MySql Second Edition

I’m a programmer by choice but a DBA/Sysadmin by need.  Thats not to say I don’t really enjoy the other two roles just that I don’t always focus on improving my skills in either as much as I do my development skills.  Given that a larger and larger portion of my work these days revolves around scaling databases and systems I’ve started to focus more on improving this side of my skillset.  This week I’ve started reading High Performance MySql Second Edition.  I’ve only read the first few seconds but already have a long list of notes about things I didn’t know about or worse yet thought I knew about but was wrong about.

I’ll post a full follow up once I’ve completed it but it’s starting out looking really good.


Apr 22 2009

StopNGoGrid

I’ve been working a lot with GoGrid lately as we needed some west coast servers and Amazon doesn’t offer US-west yet.  Maybe I’ve just been spoiled by Amazons offering but I’ve found GoGrid to be almost unusable for day to day work.  A few of the issues so far.

1. OS images – They only offer centos/RHEL for the linux side which I am not comfortable with plus I hate yum so I end up spending large amounts of time fighting the system to configure new servers

2. Load balancer – You get a free load balancer which is awesome! What they don’t tell you is that they are not configurable.  Add a new server to your cluster?  You need to delete the load balancer and recreate it which of course means downtime.  Like 10-15 minutes of downtime everytime you want to add a new server.  Thats just not acceptable.  Even support can’t change this for you when you call up.

3. No stored images – If you customize your servers quite heavily you will be heartbroken to find out that you cant store that image to use on new servers.  No instead on your next server you’ll have to do it all again so write it down.

4. no upgradeability – If you start with say 1 or 2gig servers and realize after a while that you need to upgrade them you can’t.  I spoke to support about this as well and was told to create a new server and delete the old one.  Thats fine if you can use the default image but I can’t so that means 1-2 hours wasted for each server I have to change plus see issue 2 I’ll have to have 10-15 minutes of downtime when I replace the load balancer.

I’ve been working a lot of cloud environments lately including 20 boxes on Amazon EC2, and 4 on GoGrid. I’m looking at Mosso this week.  Once done I’ll post a round up of them all.


Apr 20 2009

Atlassian trial

www.atlassian.com has a special offer up today offering JIRA and Confluence for $5 for 5 users.  I grabbed both and I’m looking forward to giving them a try.  I’m always looking for the perfect issue tracker.  Heres hoping JIRA is it.


Apr 7 2009

Erlang…again

So tonight I started reading Programming Erlang again.  Well I started it last week too.  I’ll probably be starting again next week as well.  I’ve been so busy with work lately that I’ve found it difficult to study anything out of hours.  Assuming I don’t completely fail at concentrating I’ll be sure to post more about my experience.