Loading

Building Good Tools

I don’t know how I originally found it, but Caterina Fake recently wrote a post entitled Make things. In it is the following quote from Freeman Dyson:

There is a great satisfaction in building good tools for other people to use.

A little searching through Google Books helped me find the source of the quote: Disturbing the Universe, by Freeman Dyson, pp 9-10. Here is the full paragraph:

In the magic city there are not only deliverers and destroyers but also a great multitude of honest craftsmen, artisans and scribes. Much of the joy of science is the joy of solid work done by skilled workmen. Many of us are happy to spend our lives in collaborative efforts where to be reliable is more important than to be original. There is a great satisfaction in building good tools for other people to use. We do not all have the talent or the ambition to become prima donnas. The essential factor which keeps the scientific enterprise healthy is a shared respect for quality. Everybody can take pride in the quality of his own work, and we expect rough treatment from our colleagues whenever we produce something shoddy. The knowledge that quality counts makes even routine tasks rewarding.

I don’t know what “the magic city” is, but it is an inspiring paragraph.

Fix for Spork Not Reloading Models with Rails 3

So spork is pretty cool and makes running a single test a lot less painful. It loads up a rails environment and just forks each time you want to run a test instead of the standard way where it loads the entire rails environment to run one test.

But I had a problem that it was never reloading any models that changed. After much investigating, it turned out that the line

fixtures :all

in test_helper.rb was making all the models get loaded prefork, thus never get reloaded.

Remove that line and add the fixtures call to your individual tests. Kind of a pain, but much faster in the long run. As long as you are loading fixtures in each test, you can only load the ones you need to speed things up a little.

golang Makefiles

The first post in an ongoing series of how to do easy, stupid things, here’s how to write a super simple makefile for a command line program in Go:

include $(GOROOT)/src/Make.inc

TARG=ggoog
GOFILES=download.go

include $(GOROOT)/src/Make.cmd

TARG is what your executable will be called. Put any files that need to be compiled into GOFILES.

The trick is including Make.inc at the top and Make.cmd at the bottom. (If you want to build a package, you would include Make.pkg at the bottom.)

If your environment variables aren’t set correctly, just run gomake instead of make.

Forget Everything I Wrote -- Use AWS

Forget everything I wrote about “High Availability on a Shoestring”…well, the need to switch to mysql instead of postgres for replication still applies, but don’t bother using MMM.

Anyway, I spent (wasted?) weeks trying to get something set up with no single point of failure without spending a crapload of money. I shied away from Amazon’s AWS options because I thought they would be too expensive, but then I started looking into it a bit more. I decided to give it a try and move part of my system to AWS servers. I put it on two identical micro servers behind an “elastic” load balancer. It was way too easy.

I waited a while just to make sure everything was ok, then decided to try to move everything. It takes a little getting used to, but it’s shockingly easy. Unfortunately the cheapo micro instances can’t handle much (as someone wrote in a forum, they have the computing power of a smartphone) so you have to at least spring for a small instance in most cases, although I’m still using two micro instances.

Here’s what I’m using so far:

I’m still a little wary of the bill as there are certain things they charge for that are hard to estimate like the number of I/O requests to an EBS volume, but my system is way more robust than it could have been with my VPS servers. I wish I used AWS from the beginning.

MMM 2.x Email Notifications

MMM used to have an email notification system in the 1.x version. It has been removed in 2.x and instead it relies on Log4Perl to send emails. The default in the latest version does not send anything, so if you want to get notified when things change in your MMM cluster, you need a logging config file.

Here is what I’m using to log INFO and above to a standard log file and email me for any FATAL messages (filename is /etc/mysql-mmm/mmm_mon_log.conf):

log4perl.logger = INFO, LogFile, MailFatal

log4perl.appender.LogFile                           = Log::Log4perl::Appender::File
log4perl.appender.LogFile.Threshold                 = INFO 
log4perl.appender.LogFile.filename                  = /var/log/mysql-mmm/mmm_mond.log
log4perl.appender.LogFile.recreate                  = 1
log4perl.appender.LogFile.layout                    = PatternLayout
log4perl.appender.LogFile.layout.ConversionPattern  = %d %5p %m%n

log4perl.appender.MailFatal = Log::Dispatch::Email::MailSend
log4perl.appender.MailFatal.Threshold = FATAL 
log4perl.appender.MailFatal.to = your@emailaddress.com
log4perl.appender.MailFatal.buffered = 0
log4perl.appender.MailFatal.subject = FATAL error in mmm_mond
log4perl.appender.MailFatal.layout = PatternLayout
log4perl.appender.MailFatal.layout.ConversionPattern = %d %m%n

Octopus -- Master-Slave Database Connections in Ruby on Rails Configuration

octopus is a ruby gem that allows an ActiveRecord application (including ActiveRecord 3/Rails 3 applications) to connect to one master and multiple slave databases. Most of the documentation centers on its sharding features, but I am using it to send writes to the master and reads to the slaves. This is called “replication” in octopus parlance. Since there was no full example of a shards.yml file for full replication, here’s what I’m using:

octopus:
  replicated: true
  fully_replicated: true
  production:
    slave1:
      adapter: mysql
      encoding: utf8
      username: user
      password: password
      host: "db-reader-0"
      database: somedb
    slave2:
      adapter: mysql
      encoding: utf8
      username: user
      password: password
      host: "db-reader-1"
      database: somedb

What was confusing to me is that it needs replicated and fully_replicated. The master database is the one listed in your standard config/database.yml.

High Availability on a Shoestring -- MMM Mysql Multi-master Replication Manager

This is the fourth post in a series of posts on setting up a high availability web application system on a shoestring budget. The goal is to have no single point of failure and to not spend a lot of money. Read the first post, the second one, and the third one.

A brief recap: I gave up on postgres and pgpool, and after evaluating some alternatives I am now going to try good old mysql and MMM.

MMM is a system to manage a multi-master mysql installation. You put an agent script on each database server and a monitoring script runs on a separate server. The interesting thing about it is that it uses a bunch of virtual floating ip addresses to manage failover. In a typical scenario, you need one floating ip that gets assigned to whichever master is going to handle the writes. You also need a floating ip for each reader. In my two server setup (both masters), I need three virtual ip addresses, one for the writer and two for the readers.

When you start it up, db0 will have the writer ip address and one reader ip address. db1 will have a reader address. The client application has these three addresses and they always maintain the same role. If db1 goes down, the MMM monitor tells db0 to take over the reader ip that was assigned to db1. If db0 goes down, the MMM monitor tells db1 to use the writer and reader ip addresses that were assigned to db0. The client application never knows that anything is going on (but it will need to reestablish any existing connections to the ip addresses that moved, which is unavoidable).

The simplicity of the system appealed to me. All MMM is doing is shuttling ip addresses around. I also like that I can add more database servers to the system easily. If I add a slave server, I just add another floating reader ip address.

And, as an added bonus, the installation guide is very well written and it all works. I’ve tested taking servers out of the pool using the mmm_control script, shutting down mysql, and shutting down the server. MMM worked flawlessly each time.

Using mysql and MMM has satisfied all my needs:

  • There is no single point of failure. Either database server can go down and the application will still work.
  • Both database servers are using their cycles productively.
  • Failover and recovery is automatic.
  • More servers can easily be added if necessary.
  • All the technologies in use have been around for a while.
  • It’s cheap: it only requires two database servers and another server to run the monitor (more on where I put it coming up).

Now that the database is (finally) highly available, the next topic will be load balancers in front of the web servers.

High Availability on a Shoestring -- Database Replication and Failover Alternatives

This is the third post in a series of posts on setting up a high availability web application system on a shoestring budget. The goal is to have no single point of failure and to not spend a lot of money. Read the first post and the second one.

After giving up on pgpool, I went through many alternatives.

  1. Writing a postgres proxy from scratch
  2. Using DRBD and Heartbeat
  3. Using a database designed for replication
  4. Using mysql with mysql-proxy
  5. Using mysql with MMM

pgpool seemed too complicated to me. I started to write my own proxy. I wrote one in Go and another using ruby and eventmachine. They both worked fine for proxying requests to multiple backends, but failover and recovery were going to be big hurdles and I wanted to use something that has been through the wringer with other people, so unfortunately I shelved those projects (em-proxy is very cool, though!).

I found a document on Linode about setting up a highly available postgresql server cluster. It uses DRBD to mirror a partition between two servers. Heartbeat is used to figure out when a server goes down. It uses a virtual IP address that the two servers share. The active server gets the virtual IP and all clients connect through it. I followed the instructions and had this all set up, but learned it has some major drawbacks.

  • The ‘standby’ server is not usable for anything. The DRBD partition isn’t even mounted, so postgresql isn’t running.
  • DRBD makes writes about 30% slower.
  • When a server goes down or you manually transfer resources, there is a lag. Heartbeat transfers the IP address, mounts the DRBD partition, starts postgresql. It’s a little slow.
  • The system isn’t scalable beyond two servers.

If I’m paying for a database server, I want it to be doing stuff. Given that and the speed/scalability issues, I decided to keep looking.

I briefly toyed with using a database designed to be redundant, something like Cassandra or MongoDB. At this point, it would be too much work to rewrite the app to use one of these storage solutions, so I didn’t pursue them, but next time I might consider using one. Although it sounds like Cassandra uses a ton of RAM (not good for shoestring budgets)…

I didn’t really want to switch from postgres to mysql, but it started to look like I was going to have to. Mysql has had replication since version 3.23 (released in January 2001). Tons of people use it, and I used it extensively in several production environments, plus it supports multi-master replication which seems ideal for failover.

I found two promising projects to handle load balancing and failover: mysql proxy and MMM. Mysql proxy uses lua scripts to control the proxy. This seemed a bit odd and none of the official examples handled load balancing or failover. There are a few scripts in the “cookbook”, but limited evidence of their use in production environments.

MMM looked better…it handled everything I was looking for. Stay tuned for the next installment…

High Availability on a Shoestring -- PgPool-II Installation Woes

This is the second post in a series of posts on setting up a high availability web application system on a shoestring budget. The goal is to have no single point of failure and to not spend a lot of money. Read the first post.

Now that we had Postgresql up and replicating on two servers, our plan was to put pgpool-II in front of it.

pgpool claims it can do a lot, but we were particularly interested in automatic failover, online recovery, and query dispatching/load balancing. The idea is that the client applications connect to pgpool and think they are connected to a postgresql server. pgpool proxies all the queries and sends them to real postgresql servers and returns the results. It can monitor servers and not send anything to ones that fail and promote a slave to a master if necessary. It can also detect if the query is a read-only SELECT query and dispatch that to a pool of slaves.

It sounded ideal to us and would handle all of our issues automatically. Googling it made it seem like there were people using it.

But we couldn’t get it to work. The main instructions for using Postgresql 9.0 streaming replication with pgpool are here. One big fault with them is that they take you through setting it up with postgresql running on two different ports on the same machine. We were very careful and spent a lot of time figuring out how to adapt those instructions to our setup with two servers.

Installation is not a breeze. The instructions tell you to use an alpha version (no idea why). It builds and installs fine. But then you need to install three sets of custom pgsql functions. You need to compile the functions and install them on each server, but only install the sql on the master. Then there are two shell scripts, failover.sh and basebackup.sh. The failover script is used to promote a slave to a master and the basebackup script is used for online recovery. The ones in the instructions are designed for the two databases to be on the same physical server. There’s another version of basebackup.sh in the official manual that I had more luck with.

After two days, I was able to get load balancing and automatic failover to work. I could not get online recovery to work. I’m pretty sure that you need a better basebackup.sh script, but I gave up. The whole experience didn’t give me the greatest confidence in the system, and I want something robust that will help me sleep at night.

Next up: some alternatives…

High Availability on a Shoestring -- Postgresql 9.0 Streaming Replication

This is the first in a series of posts on setting up a high availability web application system on a shoestring budget. The goal is to have no single point of failure and to not spend a lot of money.

Our initial server setup while developing a new web application had everything on one server. We soon added another server and put postgresql on it. For a normal web app, you might be content to risk it and use just one or two servers, but when you need to try to make the site as resilient as possible, you’re going to have to splurge on some more servers for redundancy.

The first step in making our app more resilient was to add another database server and set up replication between it and the existing server. We like postgresql and have been using it on a lot of projects, but it hasn’t had good built-in replication. Until now.

Postgresql 9.0 (finally) has streaming master-slave replication built in. It’s about as easy to set up as mysql replication. Our servers are all Debian Lenny, and postgresql 9.0 packages don’t exist for it or in the backports. Building it from source is very easy.

So now we’ll assume you have two db servers, db0 and db1. Postgresql 9.0 is installed on both (see this post for instructions on installing Postgresql 9.0 on Debian Lenny) and you have initialized the databases. db0 will be the master, db1 will be the hot standby slave.

To set up replication, edit postgresql.conf on both servers and set the following variables:

hot_standby = on
wal_level = hot_standby
max_wal_senders = 3
wal_keep_segments = 32

Restart postgresql on the master, stop it on the slave.

Now you need to make a backup of the master, copy it to the slave, then start the slave in recovery mode.

To make a backup, on the master do the following:

psql -c "SELECT pg_start_backup('replication backup', true)" postgres
rsync -C -a --delete -e ssh --exclude postgresql.conf --exclude pg_hba.conf --exclude postmaster.pid \
    --exclude postmaster.opts --exclude pg_log --exclude pg_xlog \
    --exclude recovery.conf --exclude recovery.done \
    /var/lib/postgresql/9.0/main/ db1:/var/lib/postgresql/9.0/main/
psql -c "SELECT pg_stop_backup()" postgres

The backup should be on db1 now. To tell db1 who the master is, create a file named recovery.conf in the postgres data directory on db1. Make it look like:

standby_mode='on'
primary_conninfo='host=db0'
trigger_file = '/tmp/trigger_file0'

Start postgresql on db1 and you should be all set.

So now any changes to the database on db0 will be replicated to db1. db1 is in permanent recovery mode, but since it is in “hot standby” recovery mode, it can handle read-only queries. If db1 goes down, you should be able to restart it and it will recover anything that happened while it was down as long as the master didn’t use up all 32 write-ahead log segments (see wal_keep_segments in the postgresql.conf file above). If it has been down for a long time, or just to be safe, you can repeat the backup steps to make sure you get everything.

What about when db0 goes down? You can manually tell db1 to exit recovery mode and become the master by creating the trigger file specified in recovery.conf. Then you need to tell your clients to connect to db1 (more on this in a future post) and when db0 comes back up, you need to make it a slave to db1. Once they are back in sync, you can switch it back and make db0 the master.

This was pretty easy to set up, but we wanted more out of the setup. We wanted the failover to happen automatically and we wanted to be able to use the hot standby slave to process some queries and not just have it sit there. More on that in a future post…

See a list of all posts in The Archive »
You should follow me on Twitter: @patrickxb