This is the third post in a series of posts on setting up a high availability web application system on a shoestring budget. The goal is to have no single point of failure and to not spend a lot of money. Read the first post and the second one.
After giving up on pgpool, I went through many alternatives.
- Writing a postgres proxy from scratch
- Using DRBD and Heartbeat
- Using a database designed for replication
- Using mysql with mysql-proxy
- Using mysql with MMM
pgpool seemed too complicated to me. I started to write my own proxy. I wrote one in Go and another using ruby and eventmachine. They both worked fine for proxying requests to multiple backends, but failover and recovery were going to be big hurdles and I wanted to use something that has been through the wringer with other people, so unfortunately I shelved those projects (em-proxy is very cool, though!).
I found a document on Linode about setting up a highly available postgresql server cluster. It uses DRBD to mirror a partition between two servers. Heartbeat is used to figure out when a server goes down. It uses a virtual IP address that the two servers share. The active server gets the virtual IP and all clients connect through it. I followed the instructions and had this all set up, but learned it has some major drawbacks.
- The ‘standby’ server is not usable for anything. The DRBD partition isn’t even mounted, so postgresql isn’t running.
- DRBD makes writes about 30% slower.
- When a server goes down or you manually transfer resources, there is a lag. Heartbeat transfers the IP address, mounts the DRBD partition, starts postgresql. It’s a little slow.
- The system isn’t scalable beyond two servers.
If I’m paying for a database server, I want it to be doing stuff. Given that and the speed/scalability issues, I decided to keep looking.
I briefly toyed with using a database designed to be redundant, something like Cassandra or MongoDB. At this point, it would be too much work to rewrite the app to use one of these storage solutions, so I didn’t pursue them, but next time I might consider using one. Although it sounds like Cassandra uses a ton of RAM (not good for shoestring budgets)…
I didn’t really want to switch from postgres to mysql, but it started to look like I was going to have to. Mysql has had replication since version 3.23 (released in January 2001). Tons of people use it, and I used it extensively in several production environments, plus it supports multi-master replication which seems ideal for failover.
I found two promising projects to handle load balancing and failover: mysql proxy and MMM. Mysql proxy uses lua scripts to control the proxy. This seemed a bit odd and none of the official examples handled load balancing or failover. There are a few scripts in the “cookbook”, but limited evidence of their use in production environments.
MMM looked better…it handled everything I was looking for. Stay tuned for the next installment…