I’ve been working on a Hudson-based build farm for Sonatype and Maven open source builds since sometime in September of 2008. I’ve learned a lot as a result, so I thought I’d share some experiences from the trenches. In this third - and probably, final - installment I’ll discuss some issues we tackled with our VMWare environment itself, and look ahead to some issues with which we still grapple on a day-to-day basis.
VMWare, Efficiency, and the Space-Time Continuum
Compared to what we went through trying to get Windows builds running reliably out on the build farm, this discussion is going to seem somewhat…nitpicky. However, there are some important things to understand when you’re running a build farm on VMWare ESXi, so let’s dive in and take a look.
The first thing to understand is that the hardware specs of your ESXi machine represent a sort of theoretical maximum. Just looking at those numbers (we have 8 cores at 3.16 GHz and 32 gigs of RAM), you’ll be tempted to salivate and wring your hands as you dream about all the simultaneous builds you can run. Resist! Remember that you’ll have multiple virtual machines sharing that hardware, each of which has a certain sunk cost in terms of memory (and, minimally, CPU) overhead. This overhead comes from the RAM and CPU necessary to run a full-blown operating system, on which your Hudson instance executes. In some cases like Ubuntu JeOS (Just enough OS), which are designed for use in virtual machines, the overhead is pretty minimal though still noticeable; in other cases like Solaris or Windows, you’re stuck with the same operating system your desktop machine might run…complete with GUI. OK, I’m sure you can turn off the GUI on Solaris - it runs webservers, right? - but I’m not a Solaris expert, and more to the point I’m not interested in tainting that environment too much with customizations. Too much customization can render your build platform unique, which is a bad thing. Additionally, there can be a bit of inefficiency related to allocation of RAM and CPU resources if you structure your VMs to grab and reserve those resources no matter what. This means that even if those VMs are completely idle, they may hold onto a certain amount of RAM (usually not CPU really, in my experience) and choke out other competing VMs. On the other hand, if you don’t reserve resources for your VMs, you may face sudden lock-ups if you have too many VMs competing for what is fundamentally a finite resource.
In theory, this should simply slow down all VMs on the system; sort of a reverse rising-tide-lifts-all-ships effect. In practice, we’ve found that this sort of competition can lead to full-out system crashes. Funny thing: it turns out some operating systems don’t respond favorably to having less RAM than they thought. If it’s just a CPU-competition issue, then your VMs may simply leak time…but we’ll talk about this in a minute. After groping around in the dark for several days, we gradually determined that the best policy was to try to limit the total pseudo-hardware configurations for all running VMs to something on the order of 90% of what the ESXi machine actually has. Note that you must always tell each VM how many CPUs and how much RAM is “owns”, even if you don’t reserve those resources by messing with the Resources tab in the VM settings. (Reserving them via the Resources tab should force more of a hard allocation, limiting VMWare’s ability to shuffle resources to where they’re most needed, as I alluded to above.) What I’m talking about is really trying to keep the total resources “owned” by all running VMs just below the actual hardware resources available on the machine…it just seems to function more smoothly that way.
Managing Resources: Understanding Your Builds’ Needs
I need to stop things here and provide a bit of a disclaimer. Some of our builds are quite large, and can take a very long time to complete. In the past, each time we’ve run into resource problems in our build farm, it’s been as a result of these huge builds running on all available VMs at once. So, the load put on our particular build farm varies tremendously from moment to moment. This may seem like a strange niche case, but there’s a critical lesson here.
You have to plan for the maximum momentary load you’re likely to see on the whole build farm.
It only takes one instant maxing out the RAM on your ESXi hardware to cause one or more of your VMs can grind to a halt. If you have more than one build that can run for a long time or runs on all VMs at the same time, you need to be prepared for saturating your server’s hardware. You can limit the effects of this a little bit by using the Locks and Latches Hudson plugin and keeping long-running jobs on the same lock. This will cause the your build times for any particular distributed job to balloon, so be prepared; but failing to do this can completely lobotomize a VM, leaving it with a corrupted disk or something similar. You’ll have to ask someone else for a technical explanation of why this is, but believe me: I’ve had to rebuild VMs on multiple occasions because of this problem.
On the other hand, if you have a lot of small builds that are unlikely to jam up the works for long by themselves, you can probably get away with tuning the number of Hudson executors on each VM and leaving the CPU/RAM allocations to each VM as suggestions. That way VMWare doesn’t have to set aside that segment of its resources for an idle VM. Even if you have this sort of setup, but still have that one huge build, you can avoid Hudson gridlock by making sure you have at least two executors on each VM where the long-running job will build. This way, the more agile builds have a passing lane for to get around that trundling, grindingly slow 18-wheeler of a build.
We’ve actually been able to cheat the resource allocation rule I mentioned above to a certain extent. Our private build farm tends to have much faster, less frequent builds, so we’ve been able to almost double the number of running VMs on the ESXi server since the VMs allocated to the private build farm are idle much of the time. As we add new jobs to each build farm, I’m sure this will cease to be true, but for now the two farms look like they’re running on twice as much hardware as we physically have…and they seem happy as two peas in a pod.
Virtual machines running on ESXi tend to have some trouble keeping time. It’s a little embarrassing, and we try not to talk about it in public, but there it is. Left to their own devices, VM operating systems may move backward or forward in time relative to any outside fixed point. To the outsider, some VMs will appear slightly blue, while others will appear slightly reddish…Einstein would be impressed.
Okay, bad physics jokes aside, they’re not really moving in time; they just sort of lose track of it. The problem is pretty well documented out on the internet, and there are some pretty good instructions for compensating, like this one (PDF) from VMWare. It seems that the timekeeping problem arises from CPU allocation and kernels that count CPU ‘ticks’ to keep time. The best practice seems to be taking a two-pronged approach to keep everything synchronized. First install VMWare Tools, and second configure NTP time synchronization on each VM operating system.
VMWare Tools is meant to keep VMs in sync by catching them up when they fall behind (probably due to not getting the CPU access they expect). However, the tools are apparently useless for reigning in VMs that run out ahead of the bunch. Personally, I have no idea why a VM operating system would skip ahead, but the internet assures me it’s possible, and I’ve actually seen it happen in our build farm. To handle this problem, we must enable NTP clock synchronization for our VMs. Installing VMWare Tools is a breeze on most operating systems, except for FreeBSD. It seems there is no version available for BSD, so you’re left with NTP to keep things up-to-date. That’s okay; it does pretty well. As far as enabling NTP, this is also a breeze on most operating systems; most already have NTP installed, or can have it installed through a simple command like:
<code>sudo aptitude install ntp </code>
…Except, of course, on Windows. On windows, you’ll need to dig into the Policies section of the Control Panel, as described here to enable network time protocol. This is far less intuitive or simple than on just about any other OS (except possibly Solaris, and on Solaris the cure for all problems is a good manual).
One other interesting point about NTP: if you have an NTP server on your VMWare machine you’re thinking about using, STOP! Use an NTP server external to VMWare; remember how VMWare has some problems with timekeeping? I may have touched on this point somewhere above. In out build farm, we’re using the following NTP configuration (or approximations of this, on the Windows systems):
<code>$ cat /etc/ntp.conf server 0.north-america.pool.ntp.org server 1.north-america.pool.ntp.org server 2.north-america.pool.ntp.org server 3.north-america.pool.ntp.org </code>
Pretty simple, really. Using multiple time sources gives your network the ability to compensate for any clock skew that may appear in any one of the sources. It should also make your configuration more resilient to partial network outages, such as when the entire east coast of the US disappears from the internet (it’s happened before).
But why go to all this trouble? Why does it matter that all OS clocks tick in perfect harmony? Apparently, Hudson can lose build results if the timestamps are off by too much. It’s not just an urban legend; we’ve had this problem (which is why I know so much about VMWare’s timekeeping). Again, I’m not completely sure why Hudson loses build results, or why it relies on timestamps from slave instances at all for that matter; these are questions best asked of Hudson developers. What I can tell you is keeping time in sync throughout your build farm is in your best interest.